ArticlePDF Available

Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing


Abstract and Figures

Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distribution and a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, known so far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible directions of their migrations in retrospect. The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validated by analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generation sequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and including Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined. SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (including the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well. The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-European language carriers from Central Asia via Afghanistan and Iran to the West.
Content may be subject to copyright.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Received: December 14 2013; accepted: December 16 2013;
published: January 8 2014
Phylogenetic Structure
of Q-M378 Subclade
Based On Full
Y-Chromosome Sequencing
Vladimir Gurianov1
Leon Kull2
Roman Sychev3
Vladimir Tagankin3
Vadim Urasin3
1 The Q-L275 Research Project, Russia,
2 Full Genomes Corporation, USA,
3 YFull – research group, Russia.
Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distribution
and a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, known
so far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc-
tions of their migrations in retrospect.
The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validated
by analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generation
sequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and including
Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined.
SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (including
the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well.
The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-European
language carriers from Central Asia via Afghanistan and Iran to the West.
The Q-M378 subclade1, downstream of Q-
L275 haplogroup, is present in a number of pop-
ulations in Europe, Southwest (Western)2 and
Southern Asia3, and also in the Central Asia all
the way to North-West China4.
1 yDNA Haplogroup Q and its Subclades – 2013 - Hereinafter subclades are referenced in
line with ISOGG notation (International Society of Genetic Genealogy) specifying
single nucleotide polymorphism (SNP) typical for a respective subclade.
2 Cinniog˘lu et al, Excavating Y-chromosome haplotype strata in Anatolia, 2003.
Haplotypes 337-339 according to predictor by Urasin ( are
positive to SNP M378. All samples belong to Central-Anatolian and East-Anatolian
regions of Turkey.
3 Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y-
Chromosome Distributions in India Identify Both Indigenous and Exogenous Ex-
pansions and Reveal Minor Genetic Influence of Central Asian Pastoralists, Am J
Hum Genet. 2006 February; 78(2): 202–221. (among the tested inhabitants of
Pakistan 2 out of 176 or 1.14% were positive to SNP M378; SNP M378 was not
identified among sample groups in India and Eastern Asia).
4 Zhong et al., Extended Y-chromosome investigation suggests post-Glacial mi-
grations of modern humans into East Asia via the northern route // Molecular Bi-
ology and Evolution, First published online: September 13, 2010, doi:
10.1093/molbev/msq247 (among four populations of Uigurs from Xinjiang one
such person was found in each of the two populations: 1 out of 71, 1 out of 18).
One of the peculiar features of Q-M378 sub-
clade is a relatively wide area of its distribution
(connected with migrations of ancestral popula-
tions of the Indo-European language family) and
an extremely low percentage in almost all popu-
lations (modern ethnic groups), where it has
been reported by now. The exception is the Jew-
ish Diaspora (primarily Ashkenazi Jews), where
Q-M378 subclade share reaches 5.2 to 7 percent
(Behar 20045, Hammer 20096). Therefore, Q-
M378 locality is often associated with the Middle
East. In the meantime, a more comprehensive
analysis of research data and publicly available
data of commercial tests enables us to draw a
conclusion on more complex and rather unob-
5 Behar DM, Garrigan D, Kaplan ME, Mobasher Z, Rosengarten D, Karafet TM, Quintana-Murci L, Ost-
rer H, Skorecki K, Hammer MF. (2004). "Contrasting patterns of Y chromosome variation in Ashkenazi
Jewish and host non-Jewish European populations". Hum Genet 114 (4): 354–365.
doi:10.1007/s00439-003-1073-7. PMID 14740294
6 Hammer MF, Behar DM, Karafet TM, et al.(November 2009). "Extended Y
chromosome haplotypes resolve multiple and unique lineages of the Jewish
priesthood". Human Genetics 126 (5): 707–717. doi:10.1007/s00439-009-0727-
5. PMC 2771134. PMID 19669163.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
vious correlations between carriers of this Y-
chromosome mutation for the last millennium.
The article's aim is to, based on the available
data from open sources and conducted research
data, specify phylogenetic structure of Q-M378
subclade and provide classification of its major
clusters (haplotypes, combined according to the
following criteria: pertaining to a sequence of a
single SNP - single nucleotide polymorphisms,
phylogenetic similarity, geographical distribu-
Source data and methodology
Data sets for comparison
Data from the Personal Genome Project7
and the 1000 Genomes Project8 were used
within the framework of the conducted research.
Samples, taken from the specified projects (Ta-
ble 1), have PGP and HG prefixes respectively.
7 See also: Ball, M.P., et al., A public
resource facilitating clinical use of genomes. Proceedings of the National
Academy of Sciences, 2012. 109(30): p. 11920-11927.
8 See also: 1000 Genomes Project Consortium.
An integrated map of genetic variation from 1,092 human genomes. Nature,
2012. 491(7422): p. 56-65.
Table 1. Information based on the data from The Personal Genome Project and 1000 Genomes Project.
Sample code Population Verified origin
Bengali (BEB)
Punjabi (PJL)
Telugu (ITU)
Northern Africa
Samples HG03914, HG03652, HG03864 that
do not belong to Q-M378 subclade were used for
Additionally, data from targeted Y-
chromosome sequencing of five individuals,
tested at Full Genomes Corporation (FGC)9,
were analyzed.
Table 2. Information based on test participants' data at Full Genomes Corporation.
Sample code Population Verified origin
enazi Jews
Eastern Europe
Ashkenazi Jews
Eastern Europe
Eastern Turkey
Khuzestan province
kozha lineage
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Data sets in BAM format (BAM/SAM Specifi-
cation10) and, in case of PGP130, TSV11 format
were used for the research.
“Next-generation” sequencing12, performed
by Full Genomes Corporation at Beijing Ge-
nomics Institute using Illumina HiSeq 2000
sequencer, is characterized by the following pa-
rameters: 50x coverage at read length of 100
base pairs, with paired end reads. Mapped cov-
erage at about 23 million base pairs out of ap-
proximately 59 million base pairs, present in a
human Y-chromosome, was obtained.
Data processing and analysis
Clusterization of Q-M378 subclade haplo-
types (including haplotypes that belong to Q-
L275 upstream level and downstream levels)
was carried out based on 222 haplotypes
processing (67 STR-markers13), obtained from
public sources14. MURKA software15 was used to
construct the phylogenetic tree.
Processing and analysis of full Y-chromosome
sequencing data was made using FGC software,
along with the software developed by YFull re-
search group16.
Samples pertaining to Q-L275 subclade and
having no M378 mutation were used as refer-
ence, along with the samples of an upstream
and parallel subclades on a case-by-case basis.
Each sample was genotyped for both SNPs dis-
covered during the research and SNPs included
in the ISOGG list under Q-L275 subclade and its
downstream subclades.
Presence of mutation in more than two sam-
ples served as the criterion of a new SNP dis-
covery, as well as data consistency between the
new SNPs inter se and the previously known in-
10 An up-to-date specification version can be found at.
11 TSV (Tab Separated Values) — text format for storing and viewing tabular da-
12 Behjati & Tarpey, What is next generation sequencing?, Arch Dis Child Educ
Pract Ed 2013;98:236-238 doi:10.1136/archdischild-2013-304340
13 STR-markers (short tandem repeats).
14 Public projects data from the Family Tree DNA website: Hereinafter haplotypes from the
specified source are marked as follows - FTDNA kit and haplotype number.
15 MURKA by Valery Zaporozhchenko (Research Center of Medical Genetics of the
Russian Academy of Medical Sciences, Moscow, Russia).
formation on phylogenetic structure of a respec-
tive subclade.
Clusterization of Q-M378 subclade
based on SNP and STR-markers analysis
Given that SNPs characterize distribution of
haplotypes into clusters in a more specific way,
primary clusterization was made taking into ac-
count the known data on SNPs, defining sub-
levels of Q-M378 subclade.
There are three downstream subclades cur-
rently known17 – Q-L245, Q-L301, Q-L327. SNPs
with an L prefix, defining the above subclades,
were identified at the Family Tree DNA lab led
by Dr. Thomas Krahn.
Geography of Q-L245 distribution essentially
repeats geography of M378 distribution (except
for Central and Southern Asia).
Q-L301 subclade is localized exclusively in
Iran18. Simultaneous presence of two subclades
Q-L301 and Q-L245 in Iran and Iraq among au-
tochthonous population is indicative of the long
duration of residence of M378 mutation carries
among the people living in this region19 20.
L327 is a private SNP, represented by a sin-
gle haplotype of a Portuguese from Azores21.
Another private SNP22 is P306, localized in
one Indian. That being said, it was not found
among the tested representatives of Q-M378
subclades (including Q-L301)23.
Until recently only two SNPs were acknowl-
edged as downstream of L24524: L272.1, de-
tected in Europe (Sicily) and L315 (discovered in
17 Y-DNA Haplogroup Q and its Subclades – 2013 -
18 FTDNA kit 178026, M7540, M7949.
19 Nadia Al-Zahery et al, In search of the genetic footprints of Sumerians: a sur-
vey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq (2011). This work has some data on
Q haplotypes present in the Marsh Arabs (n=143) and Iraqis (n=154). Q-M378
has a frequency of 2.1% in the first case and 1.9% in the second one.
20 Grugni et al., Ancient Migratory Events in the Middle East: New Clues from the
Y-Chromosome Variation of Modern Iranians (2012). DOI:
10.1371/journal.pone.0041252. Among those positive to SNP M378 the following
ethnic groups come under notice – Khorasan Persians - 3 out of 59 (5.1%), Es-
fahan Persians - 1 out of 11 (9.1%), Lurs - 2 out of 50 (3.9%), Assyrians - 1 out
of 39 (2.6%), Azerbaijani - 1 out of 63 (1.6%).
21 FTDNA kit 13254.
22 FTDNA kit N78873.
23 FTDNA kit 178026, M7540, 193005, 95307 respectively.
24 Both are private SNPs, i.e. found so far in a single carrier of such mutation.
L315 – FTDNA kit 51 and L272.1 (FTDNA kit 95307). L315 may not be stable as
it was positive in HG02291 sample.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
East European Ashkenazi). Below L245 SNP
L619.2 is located as well, discovered in two rep-
resentatives of Armenian Diaspora25. Further-
more, the fact that this SNP emerged relatively
recently is confirmed by existence of Armenian
Diaspora representatives, who showed no sign
of this polymorphism26.
Consequently, until very recently Q-L245
subclade could not be clusterized using SNPs.
Thereby phylogenetic definitions and analysis of
STR-markers were used for clusterization. A
segment of DYF395S1 chromosome of low va-
riability27 was used for clusterization (the ap-
proach was initially proposed by Q yDNA
Project28 administrator Rebekah A. Canada),
which allowed formation of stable clusters with
respective geographical and ethnic reference.
For example, the following clusters were hig-
hlighted using this approach.
It includes four haplotypes: two Dagestanis
(identifiers according to the cited publication29 -
Avar Dag 511 and Kaitag Dag06 894), a Turk30
and an Arab of Iraq31. The latter belongs to the
legendary tribe of Quraysh (Adnan-Modar tribal
This cluster is located closer to the tree root
L245 than any other one and, apparently, is the
nearest to the ancestral haplotype.
It includes a whole group of haplotypes of
people of various origin. One can pinpoint the
following subclusters in the cluster:
- Central European (localization of most
ancestral lineages Switzerland32, part of them
is linked to a Mennonite community);
25 FTDNA kit E5340, 191379.
26 FTDNA kit 173902, 178717.
27 Vladislav Ryzhkov, Calculating time to the most recent common ancestor by
separate panels of Y-STR markers, sorted by increasing mutation rate constants,
The Russian Journal of Genetic Genealogy (Russian version): Vol. 3, No. 2, 2011,
ISSN: 1920-2997
28 Q yDNA Project
29 Balanovsky et al, Parallel Evolution of Genes and Languages in the Caucasus
Region. Molecular Biology and Evolution, 13 May 2011.
30 FTDNA kit 303617.
31 FTDNA kit 197506.
32 The SCHACKE surname appeared in Germany at least as early as the 1600s
and perhaps earlier. The JAGGI surname in Switzerland goes back much further.
With this DNA Project we hope to learn more about our early ancestors and
where our ancestors originated. Johann Christoffel SCHACKE, the paternal
ancestor of most who carry the SHOCKEY surname, was born in
Kirchheimbolanden, Pfalz, Germany in 1720 to Swiss parents. He arrived in
Philadelphia PA in 1737. The Anglicized version of his name became John
Christopher Shockey. He and his wife Barbara had nine children between 1739
- North-European (localization of most
ancestral lineages – Netherlands33);
- Italian (including haplotypes with partial
SNP L272.1);
- Armenian;
- Southwest Asian.
It should be noted that according to
DYF395S1=15-17 attribute, a number of haplo-
types with no L245 mutation, are part of the
cluster, in particular haplotypes of a level, which
will be further described as Q-Y2250, as well as
haplotypes of level Q-L327, and Q-P306. How-
ever, in view of a thesis adopted by us on priori-
ty of SNP application during clusterization, we
will not do that. This also implies a conclusion
that clusters DYF395S1=14-17 and/or 15-17
were formed already as a part of Q-M378 level.
This hypothesis however can be made more
specific only with the growth of a number of
tested representatives of the cluster.
These two clusters are represented exclu-
sively by people of Jewish origin.
Individual haplotypes, having RecLOH (the
so-called Recombinational Loss of Heterozygosi-
ty) in this part of Y-chromosome, were not con-
sidered under this clusterization.
It is expected to identify SNPs, corresponding
to each of the above-mentioned STR-based clus-
ters, as part of further research.
and 1756, six sons and three daughters. After Barbara died John Christopher
married Anna Marie COMPTON. John Christopher and Anna Marie had one son
born in 1774 or 1775. This project hopes to help identify the descendants of the
seven sons of John Christopher SHOCKEY as well as learn more about his Swiss
ancestors and their related families from Germany and/or Switzerland.
33 Huff/Hough Surname Project - A Dutch named
Derrick Pauluszen Hoff (1649-1730), who arrived in New Amsterdam (New York)
no later than 1660, is considered to be the common ancestor of the family.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
New phylogenetic structure
of Q-M378 subclade,
upstream and parallel subclades
As a result of processing and analysis of full
Y-chromosome sequencing data some new sin-
gle nucleotide polymorphisms were discovered,
their placements defined on Y-chromosome (ac-
cording to the reference sequence of human ge-
nome hg1934), as well as phylogenetic place-
ments on the SNP tree.
The data on the new SNPs was summarized
in Tables 3-5 along with Diagram 1, specifying
SNP tagging according to Y notation35 and Full
Genomes Corporation notation36.
34 hg19 reference sequence or GRCh37. See also: Human Genome Overview.
35 Y – SNP prefix according to YFull.
36 FGC – SNP prefix according to Full Genomes Corporation.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Diagram 1. Phylogenetic tree of Q-M378 subclade, upstream and parallel subclades.
* SNPs included in ISOGG SNP tree (2013).
* SNPs, included by ISOGG in the list of "SNPs under Investigation" or mentioned in public sources.
* SNPs, explored by YFull team or/and Full Gemomes team.
* SNPs, mentioned in public sources, are marked in green.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
As can be seen from the above, below L275
SNP level the following levels, not described to
this day, were discovered:
1) Q-Y1150 level, which is downstream of
Q-L275 and parallel to Q-M378. SNPs of this
level were discovered in only three natives of
Hindustan (HG03914, HG03652, HG03864)37.
2) Q-Y2250 level, downstream of Q-M378
and parallel to Q-L245. SNPs of this level (Table
3) were found in Ir1 and Kz1 samples. Seeing
that Ir1 sample has a positive SNP L301 value,
and Kz1 is negative to this SNP, it is evident
that Q-L301 level is downstream of Q-Y2250.
Private SNPs of Kz1 sample are listed in Appen-
dix 3. Private SNPs of Ir1 sample are listed in
Appendix 7.
3) Q-Y2220 level, downstream of Q-L245.
This level combines haplotypes of Jewish and
Armenian clusters Q-L245. All tested samples of
this cluster representatives (AJ1, AJ2, Ar1) had
positive SNPs of this level (see Table 4),
excluding PGP130 sample (Moroccan origin).
37 G.R. Magoon, R.H. Banks, C. Rottensteiner, B.E. Schrack, V.O. Tilroe, T. Robb,
A.J. Grierson, “Generation of high-resolution a priori Y-chromosome phylogenies
using ‘next-generation’ sequencing data”, 2013, doi:10.1101/00802 (in prepara-
tion, preprint on
4) There is also Q-Y2220 level parallel to Q-
Y2200 (xQ-Y2200) that contains SNPs, defining
Armenian segment of DYF395S1=15-17 cluster.
Due to the fact that these SNPs were found in
only one sample (Ar1) they have a status of pri-
vate ones. Although one can assume the follow-
ing with high probability:
- that part of these SNPs will be characte-
rized by a rather wide range of haplotypes of
DYF395S1=15-17 cluster;
- Q-L619.2 level will be downstream of Q-
Y2220 (xQ-Y2200), since only a part of Arme-
nians, who are positive to SNP L245, belong to
it. Ar1 sample, tested by us, showed no sign of
L619.2 mutation.
5) Q-Y2200 level, downstream of Q-Y2220.
SNPs of this level define Jewish cluster Q-L245
(see Table 5). Private SNPs of samples AJ1 and
AJ2 are listed in Appendices 5, 6. In addition,
both tested samples had no L315 mutation.
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Table 3. Q-Y2250 level. New SNPs, downstream of positive SNP M378.
value SNP name (Y) SNP name (FGC)
or synonym
7115834 C T Y2244 FGC4626
6894323 C T Y2245 PR683
3544336 C G Y2246 FGC4613
2765038 T G Y2247 FGC4607
4070598 G A Y2248 FGC4618
4242831 A G Y2249 FGC4619
4852955 G A Y2250 FGC4620
6537988 A G Y2251 FGC4624
6724553 C T Y2252
8671530 A G Y2255 FGC4631
10077457 T C Y2256 FGC4635
15766997 A C Y2263 FGC4646
18169503 A C Y2264 FGC4656
18803364 C T Y2265 FGC4657
18990293 A G Y2266 FGC4659
22525954 AT A Y2268
23956540 A T Y2269 FGC4675
24452225 G C Y2270 FGC4676
15684681 A T CTS4507
13643442 T C FGC4638
Note: Y2268 – deletion.
Table 4. Q-Y2220 level. New SNPs, downstream of positive SNP L245.
value SNP name (Y) SNP name (FGC)
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Table 5. Q-Y2200 level. New SNPs, downstream of positive SNP L245.
Value positive
to SNP SNP name (Y) SNP name (FGC)
23646920 C T Y2196 FGC1934
22953894 A G Y2197 FGC1933
22825080 A G Y2198 FGC1932
22588598 C T Y2200 FGC1929
22471554 A T Y2201 FGC1928
21277083 G A Y2203 FGC1923
19425984 G A Y2206
19053060 C T Y2207 FGC1919
18207170 A G Y2208 FGC1918
18046486 T C Y2210 FGC1916
18043999 G A Y2211 FGC1915
16994660 T A Y2212 FGC1914
15834557 G A Y2213 FGC1912
14385853 T G Y2215 FGC1911
14353022 A C Y2216 FGC1910
14184253 C A Y2218 FGC1909
9892635 C T Y2219 FGC1906
9401947 C A Y2221 FGC1903
8662585 C A Y2224 FGC1899
6949449 C T Y2225 FGC1897
4606181 C T Y2231 FGC1890
3995524 G A Y2232 FGC1888
3148720 A G Y2233 FGC1886
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Placement of SNPs, listed by ISOGG as SNPs
under Investigation, was specified within the
scope of this work: F108, F803, F815, F1082,
F1126, F1169, F1213, F1337, F1349, F1528,
F1537, F1594, F1734, F1780, F1836, F1839,
F1858, F1875, F1974, F2023, F2145, F2230,
F2313, F2343, F2440, F2628, F2657, F2777,
F2851, F2877, F2894, F2934, F3084, F3121,
F3193, F3207, F3389, F3621, F3680. On May 8,
2013 all of the above SNPs were classified by
ISOGG as pertaining to level L245 or below. The
analysis showed necessity to modify the pro-
posed scheme. All SNPs, apart from F1213,
F1349, F1594, F1734, F1780, F1836, F1839,
F2230, F2877, pertain to level Q-L275, as they
are positive for samples HG03914, HG03652,
HG03864, AJ1, AJ2, Ar1, Ir1. The remaining
SNPs, in their turn, are positive to all samples in
the research that are positive to M378 and L245.
Consequently, the said SNPs are at the same
level with Q-L275 and Q-M378 respectively38.
Besides, a considerable amount of new SNPs
was discovered at the same level with L275,
M378 and L245.
For example, the following SNPs pertain to
level Q-L275 - Y1014-Y1022, Y1024-Y1057,
Y1059-Y1069, Y1071-Y1137, Y1139, Y1142,
Y1153, Y1160, Y1164, Y1166, Y1167, Y1169,
Y1195, Y1220, Y1240, Y1978-Y1983, Y1985-
Y1989, Y1991-Y1993, Y1995, Y1996-Y1997,
Y2003, Y2005-Y2007, Y2009, Y2239, Y2243;
to level Q-M378 - Y2012, Y2013, Y2016-
Y2082, Y2084-Y2095, Y2097, Y2098, Y2113-
Y2115, Y2226, Y2361 (Appendix 1, Table 6);
to level Q-L245 - Y2116-2149, Y2195,
Y2199, Y2204, Y2217, Y2222, Y2223, Y2235,
Y2237 (Appendix 2, Table 7).
The said SNPs do not at the moment have
any phylogenetic meaning, but it can be as-
38 It should be noted that FTDNA research team led by Dr. Thomas Krahn, with
the participation of Q yDNA Project administrator Rebekah A. Canada, came to a
similar conclusion earlier. Respective data can be found on the SNP tree draft
version page of the Family Tree DNA website: There was no
published justification of such conclusions, but, presumably, samples, tested un-
der National Geographic Geno 2.0 project, were used for the analysis.
signed to them later after a full sequencing of
samples, pertaining to these levels and without
SNP mutation, defining downstream levels.
The research proved high efficiency of full Y-
chromosome sequencing to define phylogenetic
structure, allowed for forming a consistent phy-
logenetic structure of Q-M378 subclade, con-
firmed by analysis of SNP and STR-markers.
As part of the research, new phylogenetic le-
vels of Q-Y2250 (downstream of Q-M378 and in-
cluding Q-L301), Q-Y2220 (downstream of Q-
L245), Q-Y2200 (downstream of Q-Y2220) were
defined. SNPs, which, in the future, may possi-
bly mark certain European and Asian subclusters
Q-Y2220 (including the Armenian subcluster), as
well as separate branches of the Jewish cluster
Q-Y2200, were also defined.
The research confirmed connection of Q-
M378 subclade distribution with migration of In-
do-European language carriers from Central Asia
via Afghanistan and Iran to the West. That being
said, the amount of materials at the researchers'
disposal at the moment is not enough to form
an entire picture of the mentioned migration
processes. The specified task can be resolved in
the near future, while statistically significant da-
ta is being accumulated.
The authors of the article wish to thank the
following people, who rendered their assistance
in its preparation and conducting the research:
Mikhail Edelstein (Russia)
Askar Abdullin (Kazakhstan)
Igor Bukharov (Russia)
Nazaret Chitilian (Lebanon)
Justin Allen Loe (United States)
Gregory Magoon (United States)
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Appendix 1.
Table 6. SNPs at the same level with M378.
value Derived value SNP name (Y)
or synonym SNP name (FGC)
2806676 A G Y2012 FGC1770
3111159 G C Y2013 FGC1758
3815203 G C Y2016 FGC1774
3929337 C A Y2017 FGC1988
4234101 A G Y2018 FGC1775
4332151 G A Y2019 FGC1776
4634427 C A Y2020 FGC1777
4775787 T C Y2021 FGC1779
4778576 A G Y2022 FGC1780
4783438 T C Y2023
4961249 C A Y2024 FGC1781
5011266 A G Y2025
5266522 A G Y2026 FGC1782
5496739 A C Y2027 FGC1783
5687522 T A Y2028 FGC1784
5751055 T G Y2029 FGC1785
5872168 C T Y2226
5963558 G A Y2030
6085717 C A Y2031 FGC1788
6430659 T G Y2032 FGC1789
6617825 T C Y2033 FGC1790
6618215 T C Y2034 FGC1791
6746675 T C Y2035 FGC1792
6774328 T C Y2036 FGC1793
6986250 T C Y2037 FGC1794
7045044 C T Y2038 FGC1795
7071796 C G Y2039 FGC1796
7094691 A G Y2040 FGC1797
7159039 C G Y2041 FGC1798
7160439 G A Y2042 FGC1799
7339849 G T Y2043 FGC1801
7431253 C T Y2044 FGC1803
7437821 C G Y2045 FGC1804
7550568 G C Y2046 FGC1805
7652630 G A Y2047
7778164 G A Y2048 FGC1807
7856334 A G Y2049 FGC1808
7952263 C T Y2050 FGC1809
8067818 C G Y2051 FGC1810
8681004 T C Y2052 FGC1812
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
8682184 C T Y2053 FGC1813
8821295 A G Y2054 FGC1814
9074666 C T Y2055 FGC1815
9170505 G T Y2056 FGC1817
13127815 A G Y2057 FGC1818
13928638 G C Y2058 FGC1820
14017272 A G Y2059 FGC1825
14193680 G A Y2060 FGC1827
14293849 T A Y2061 FGC1830
14435779 A G Y2062 FGC1833
14540558 C T Y2063 FGC1834
14674385 C T Y2064 FGC1835
14733633 C A Y2065 FGC1836
15498011 C A Y2066
15521110 T C Y2067 FGC1838
15699493 C T Y2068 FGC1841
16217389 A AT Y2069
16654310 C G Y2070 FGC1842
16678163 C T Y2071 FGC1843
17230548 G A Y2072 FGC1844
17447489 C T Y2073 FGC1845
17959860 A G Y2074 FGC1850
18243302 C T Y2075 FGC1852
18714407 C A Y2076 FGC1854
18768735 G T Y2077
18768736 C A Y2078
18769454 A G Y2079 FGC1767
18803642 T G Y2080 FGC1855
18856911 G C Y2081 FGC1856
19373808 A T Y2082 FGC1858
21365952 G A Y2084 FGC1861
21479863 G A Y2085 FGC1862
21647670 G C Y2086 FGC1863
21832029 C A Y2087 FGC1864
22022365 A G Y2088 FGC1865
22101157 C T Y2089 FGC1866
22440644 G A Y2361
22624047 G A Y2090 FGC1768
22931328 T A Y2091 FGC1869
23053626 A G Y2092 FGC1872
23078557 G T Y2093 FGC1873
23166596 T C Y2094 FGC1874
23279919 G T Y2095 FGC1875
23566714 C T Y2097 FGC1877
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
23615574 AT A Y2098
28516009 A T Y2113
28593688 T C Y2114
28687807 A G Y2115
*Note: Y2098 – deletion, Y2069 – insertion.
Appendix 2.
Table 7. SNPs at the same level with L245.
value Derived value SNP name (Y) SNP name (FGC)
2794289 C G Y2116 FGC1987
3127708 T C Y2117 FGC1771
3709585 A C Y2118 FGC1773
4502969 T C Y2119 FGC1759
4671322 C A Y2120 FGC1778
7219594 T C Y2121 FGC1800
7408851 C A Y2122 FGC1802
7590793 C T Y2123 FGC1806
8614513 C G Y2124 FGC1811
9144039 A T Y2223 FGC1901
9382621 G T Y2222 FGC1902
9798919 G A Y2125 FGC1816
13956388 G A Y2126 FGC1821
13982835 C T Y2127 FGC1823
14012662 G A Y2128 FGC1824
14045736 T C Y2129 FGC1826
14202870 A G Y2130 FGC1828
14285880 C G Y2131 FGC1829
14296099 C A Y2217 FGC1831
14402304 G A Y2132 FGC1832
15569048 C T Y2133 FGC1839
15614105 C G Y2134 FGC1840
16519324 A G Y2135
16757414 G GA Y2237
17686482 T C Y2136 FGC1846
17686883 A G Y2137 FGC1847
17763793 T A Y2138 FGC1848
17860015 G T Y2139 FGC1849
18134822 T C Y2140 FGC1851
18575106 G A Y2141 FGC1853
19300050 C T Y2142 FGC1857
21118566 T C Y2143 FGC1859
22015887 C A Y2144 FGC1989
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
22934317 ATC A Y2235
23010582 C T Y2145 FGC1870
23042385 C A Y2146 FGC1871
23648959 T G Y2147 FGC1878
23733052 A G Y2148 FGC1879
28520821 A G Y2149
28646637 C G Y2195 FGC1883
22767464 G A Y2199 FGC1868
21235857 A G Y2204 FGC1860
*Note: Y2235 – deletion, Y2237 – insertion.
Appendix 3.
Table 8. Private SNPs for Kz1 sample.
value Derived value SNP name (Y) SNP name (FGC)
2980949 T C YFS026208
3027441 C A YFS026210 FGC4858
3751684 G A YFS026242 FGC4859
4164029 A G YFS026250 FGC4860
4515848 G A YFS026257 FGC4862
4714529 G T YFS026264 FGC4864
5394870 T C YFS026279 FGC4865
5398133 A T YFS026280 FGC4866
6088200 T C YFS026301 FGC4867
6675390 A G YFS026321 FGC4868
7058898 G A YFS026329 FGC4869
7208802 C T YFS026339 FGC4870
7278041 G A YFS026340 FGC4871
7704050 C T YFS026351 FGC4856
7929100 A C YFS026356 FGC4872
8268654 G A YFS026361 FGC4873
8684090 G A YFS026366 FGC4874
8714870 C T YFS026367 FGC4875
9154952 G A YFS026372 FGC4876
9990725 C G FGC4878
13230336 G A FGC4879
13313894 G C FGC4880
13637299 G A FGC4881
14599760 G A YFS026426 FGC4882
15353330 C T YFS026439 FGC4883
15540398 G A YFS026445 FGC4884
15617600 G A YFS026447 FGC4885
15656595 A C YFS026448
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
15881099 G A YFS026457 FGC4886
17344441 A G YFS026496 FGC4887
17455705 C G YFS026499 FGC4888
17619239 A C YFS026502 FGC4889
18132430 T A YFS026506 FGC4890
18205189 C A YFS026508 FGC4891
18235952 C A YFS026509 FGC4892
18427622 C T YFS026514 FGC4893
18699065 G A YFS026522 FGC4894
19119009 G A YFS026534 FGC4895
21794826 T C YFS026585 FGC4896
21824228 C T YFS026586 FGC4897
22216997 C A YFS026594 FGC4898
22263424 G T FGC4899
22464918 G A YFS029304
22470401 G T YFS029305 FGC4901
22476862 T A FGC4902
22779292 G A YFS026598 FGC4904
22845858 T A YFS026600 FGC4905
22980932 G A YFS026603 FGC4906
23097922 G T YFS026606 FGC4907
23188736 C T YFS026608 FGC4908
23574588 G T YFS026618 FGC4909
28577678 T G FGC4857
28556325 T G YFS026709
Appendix 4.
Table 9. Private SNPs for Ar1 sample.
value Derived value SNP name (Y) SNP name (FGC)
2837084 G A YFS030295
4687602 C T YFS030307
3264534 G T YFS030298
3692600 G A YFS030300
6849037 A G YFS030309
7389018 T C YFS030314
7809088 C T YFS030318 FGC2000
8227956 C T YFS030321 FGC2001
8310172 G A YFS030322 FGC2002
8891034 A G YFS030324 FGC2003
9455617 G C YFS030326 FGC2004
9507128 G A YFS030327 FGC2005
13207417 C T FGC2006
13862984 G A YFS030335 FGC2007
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
14037704 A G YFS030339 FGC2008
14266100 G A YFS030343
14271743 G T YFS030344 FGC2009
14645998 A T YFS030350
15487465 T C YFS030354 FGC2010
15532493 G C YFS030355 FGC2011
15562737 G A YFS030356 FGC2012
15649426 C G YFS030357
15949197 C T YFS030358 FGC2013
16033272 G A YFS030359 FGC2014
16914913 A T YFS030368
17143642 G A YFS030370 FGC2015
17264341 C T YFS030371 FGC2016
17350212 G T YFS030372 FGC2017
17468836 G A YFS030374 FGC2018
17522056 C A YFS030375 FGC2019
17547056 C T YFS030376 FGC1986
17969724 T C YFS030377 FGC2020
18005360 G A YFS030378 FGC2021
18082500 T C YFS030379 FGC2022
18143358 C T YFS030380
18269281 T C YFS030381 FGC2023
19295864 G A YFS030386 FGC2024
19305808 C G YFS030387 FGC2025
21920836 G T YFS030396 FGC2026
22195671 T G YFS030398 FGC2027
22546195 T C YFS030431 FGC2029
23036871 A C YFS030432 FGC2030
23193319 C G YFS030433 FGC2031
23633830 T C YFS030434 FGC2032
23749442 C G YFS030435 FGC2033
23952561 G A YFS030438 FGC2034
28546577 A G YFS030460 FGC2035
28697215 C T YFS030463 FGC2036
28728861 A G YFS030465 FGC2037
28773229 G A YFS030466 FGC2038
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Appendix 5.
Table 10. Private SNPs for AJ1 sample.
value Derived value SNP name (Y) SNP name (FGC)
3014878 G C YFS028077
3279492 T C YFS028084
4705139 G A YFS028121
4734829 G T YFS028122
5007712 T C YFS028135
6028097 T C YFS028158 FGC4835
6671453 T A YFS028174
6985833 G C YFS028180 FGC4836
7116693 C G YFS028187 FGC4837
13225084 C A FGC4839
13227006 C T FGC4840
14174284 C T YFS028277 FGC4841
14683323 G A YFS028303
15749472 C G YFS028328 FGC4842
15911171 T A YFS028333 FGC4843
17216758 C G YFS028365 FGC4844
17842405 G A YFS028379 FGC4845
18697269 A G YFS028399 FGC4846
22541678 G A YFS028484
22545510 G T YFS028485 FGC4850
22809218 A T YFS028490 FGC4851
22816094 C T YFS028491 FGC4852
22989959 T C YFS028498 FGC4853
23338485 T C YFS028509 FGC4854
Appendix 6.
Table 11. Private SNPs for AJ2 sample.
value Derived value SNP name (Y) SNP name (FGC)
3085515 C A YFS030088 FGC1885
4157714 C T YFS030093 FGC1889
7357489 C T YFS030117 FGC1898
8757232 C A YFS030130 FGC1900
9761433 C T YFS030140 FGC1924
16933881 C T YFS030164 FGC1913
19228285 T C YFS030189 FGC1920
21322098 A G YFS030210 FGC1924
22128896 C T YFS030218 FGC1926
22612418 A T YFS030247 FGC1930
22720359 C T YFS030248 FGC1931
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
Appendix 7.
Table 12. Private SNPs for Ir1 sample.
Position (hg19) Ancestral value Derived value SNP name (Y)
2808294 G A YFS030486
2848925 C T YFS030487
3241019 G A YFS030493
3331565 C T YFS030495
3617298 G A YFS030498
3905106 T C YFS030501
3983695 G A YFS030503
4048861 C G YFS030505
4976524 T C YFS030521
4976526 T C YFS030522
5021496 G C YFS030523
5219277 T A YFS030526
5844571 C T YFS030529
6531744 G A YFS030531
7398730 T C YFS030543
7685828 G T YFS030547
7997281 G C YFS030548
8350958 G A YFS030550
8482074 C G YFS030551
8874735 C A YFS030553
9459692 A G YFS030555
9832592 A G YFS030556
14022660 C A YFS030564
14273656 A G YFS030573
14401614 C T YFS030575
14532575 G T YFS030582
14916116 G A YFS030585
14996654 G A YFS030588
15012864 C A YFS030589
15240341 G C YFS030591
15799031 G C YFS030596
15933501 T A YFS030599
16253494 C T YFS030602
16280147 C T YFS030603
16304710 T C YFS030604
16875622 C T YFS030608
17529042 G A YFS030616
18106050 C T YFS030618
18903761 A C YFS030626
19157289 G A YFS030633
19198307 A T YFS030634
The Russian Journal of Genetic Genealogy (
, №
ISSN: 1920-2997 © Все права защищены RJGG
19526472 A C YFS030637
21359025 C G YFS030656
21567329 G A YFS030657
22564450 C T YFS030684
22621906 G T YFS030685
22687343 A T YFS030686
22910874 G A YFS030688
23018638 T C YFS030689
23054174 T G YFS030690
23198785 A T YFS030691
23435852 A C YFS030694
24484883 T C YFS030706
28759876 C T YFS030732
17188634 T C YFS030609
19001468 C T YFS030630
20534862 T C YFS030645
21599239 A G YFS030658
21836635 A T YFS030661
Full-text available
The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup structure, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220) and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the inner structure of the pointed subclades and on comparing of their existing ethno-population composition with themigration of the Indo-European tribes.
Full-text available
The recent growth of the Y-chromosome full sequencing data sets made it possible to arrange a full-scale update of Q1b haplogroup phylogenetic structure (Q-L275). The article contains a detailed description of Q-Y2200 and Q-YP745 branches as well as a number of other subclades downstream to Q-L275 including their dating as per the method described in the article of Adamov et al. 2015. We also made an effort of inter-discipline review of data from the field of population genetics, archaeology, and comparative linguistics; formalized major problems arisen upon delivering this analysis, and offered a few hypotheses on migration paths of the groups comprising representatives of different Q1b subclades
Full-text available
Next generation sequencing (NGS), massively parallel or deep sequencing are related terms that describe a DNA sequencing technology which has revolutionised genomic research. Using NGS an entire human genome can be sequenced within a single day. In contrast, the previous Sanger sequencing technology, used to decipher the human genome, required over a decade to deliver the final draft. Although in genome research NGS has mostly superseded conventional Sanger sequencing, it has not yet translated into routine clinical practice. The aim of this article is to review the potential applications of NGS in paediatrics.
Full-text available
Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations.
Full-text available
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
Full-text available
For millennia, the southern part of the Mesopotamia has been a wetland region generated by the Tigris and Euphrates rivers before flowing into the Gulf. This area has been occupied by human communities since ancient times and the present-day inhabitants, the Marsh Arabs, are considered the population with the strongest link to ancient Sumerians. Popular tradition, however, considers the Marsh Arabs as a foreign group, of unknown origin, which arrived in the marshlands when the rearing of water buffalo was introduced to the region. To shed some light on the paternal and maternal origin of this population, Y chromosome and mitochondrial DNA (mtDNA) variation was surveyed in 143 Marsh Arabs and in a large sample of Iraqi controls. Analyses of the haplogroups and sub-haplogroups observed in the Marsh Arabs revealed a prevalent autochthonous Middle Eastern component for both male and female gene pools, with weak South-West Asian and African contributions, more evident in mtDNA. A higher male than female homogeneity is characteristic of the Marsh Arab gene pool, likely due to a strong male genetic drift determined by socio-cultural factors (patrilocality, polygamy, unequal male and female migration rates). Evidence of genetic stratification ascribable to the Sumerian development was provided by the Y-chromosome data where the J1-Page08 branch reveals a local expansion, almost contemporary with the Sumerian City State period that characterized Southern Mesopotamia. On the other hand, a more ancient background shared with Northern Mesopotamia is revealed by the less represented Y-chromosome lineage J1-M267*. Overall our results indicate that the introduction of water buffalo breeding and rice farming, most likely from the Indian sub-continent, only marginally affected the gene pool of autochthonous people of the region. Furthermore, a prevalent Middle Eastern ancestry of the modern population of the marshes of southern Iraq implies that if the Marsh Arabs are descendants of the ancient Sumerians, also the Sumerians were most likely autochthonous and not of Indian or South Asian ancestry.
Full-text available
We analyzed 40 single nucleotide polymorphism and 19 short tandem repeat Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation, and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language coevolution occurred within geographically isolated populations, probably due to its mountainous terrain.
Full-text available
Genetic diversity data, from Y chromosome and mitochondrial DNA as well as recent genome-wide autosomal single nucleotide polymorphisms, suggested that mainland Southeast Asia was the major geographic source of East Asian populations. However, these studies also detected Central-South Asia (CSA)- and/or West Eurasia (WE)-related genetic components in East Asia, implying either recent population admixture or ancient migrations via the proposed northern route. To trace the time period and geographic source of these CSA- and WE-related genetic components, we sampled 3,826 males (116 populations from China and 1 population from North Korea) and performed high-resolution genotyping according to the well-resolved Y chromosome phylogeny. Our data, in combination with the published East Asian Y-haplogroup data, show that there are four dominant haplogroups (accounting for 92.87% of the East Asian Y chromosomes), O-M175, D-M174, C-M130 (not including C5-M356), and N-M231, in both southern and northern East Asian populations, which is consistent with the proposed southern route of modern human origin in East Asia. However, there are other haplogroups (6.79% in total) (E-SRY4064, C5-M356, G-M201, H-M69, I-M170, J-P209, L-M20, Q-M242, R-M207, and T-M70) detected primarily in northern East Asian populations and were identified as Central-South Asian and/or West Eurasian origin based on the phylogeographic analysis. In particular, evidence of geographic distribution and Y chromosome short tandem repeat (Y-STR) diversity indicates that haplogroup Q-M242 (the ancestral haplogroup of the native American-specific haplogroup Q1a3a-M3) and R-M207 probably migrated into East Asia via the northern route. The age estimation of Y-STR variation within haplogroups suggests the existence of postglacial (∼18 Ka) migrations via the northern route as well as recent (∼3 Ka) population admixture. We propose that although the Paleolithic migrations via the southern route played a major role in modern human settlement in East Asia, there are ancient contributions, though limited, from WE, which partly explain the genetic divergence between current southern and northern East Asian populations.
Full-text available
It has been known for over a decade that a majority of men who self report as members of the Jewish priesthood (Cohanim) carry a characteristic Y chromosome haplotype termed the Cohen Modal Haplotype (CMH). The CMH has since been used to trace putative Jewish ancestral origins of various populations. However, the limited number of binary and STR Y chromosome markers used previously did not provide the phylogenetic resolution needed to infer the number of independent paternal lineages that are encompassed within the Cohanim or their coalescence times. Accordingly, we have genotyped 75 binary markers and 12 Y-STRs in a sample of 215 Cohanim from diverse Jewish communities, 1,575 Jewish men from across the range of the Jewish Diaspora, and 2,099 non-Jewish men from the Near East, Europe, Central Asia, and India. While Cohanim from diverse backgrounds carry a total of 21 Y chromosome haplogroups, 5 haplogroups account for 79.5% of Cohanim Y chromosomes. The most frequent Cohanim lineage (46.1%) is marked by the recently reported P58 T->C mutation, which is prevalent in the Near East. Based on genotypes at 12 Y-STRs, we identify an extended CMH on the J-P58* background that predominates in both Ashkenazi and non-Ashkenazi Cohanim and is remarkably absent in non-Jews. The estimated divergence time of this lineage based on 17 STRs is 3,190 ± 1,090 years. Notably, the second most frequent Cohanim lineage (J-M410*, 14.4%) contains an extended modal haplotype that is also limited to Ashkenazi and non-Ashkenazi Cohanim and is estimated to be 4.2 ± 1.3 ky old. These results support the hypothesis of a common origin of the CMH in the Near East well before the dispersion of the Jewish people into separate communities, and indicate that the majority of contemporary Jewish priests descend from a limited number of paternal lineages. Electronic supplementary material The online version of this article (doi:10.1007/s00439-009-0727-5) contains supplementary material, which is available to authorized users.
Full-text available
Analysis of 89 biallelic polymorphisms in 523 Turkish Y chromosomes revealed 52 distinct haplotypes with considerable haplogroup substructure, as exemplified by their respective levels of accumulated diversity at ten short tandem repeat (STR) loci. The major components (haplogroups E3b, G, J, I, L, N, K2, and R1; 94.1%) are shared with European and neighboring Near Eastern populations and contrast with only a minor share of haplogroups related to Central Asian (C, Q and O; 3.4%), Indian (H, R2; 1.5%) and African (A, E3*, E3a; 1%) affinity. The expansion times for 20 haplogroup assemblages was estimated from associated STR diversity. This comprehensive characterization of Y-chromosome heritage addresses many multifaceted aspects of Anatolian prehistory, including: (1) the most frequent haplogroup, J, splits into two sub-clades, one of which (J2) shows decreasing variances with increasing latitude, compatible with a northward expansion; (2) haplogroups G1 and L show affinities with south Caucasus populations in their geographic distribution as well as STR motifs; (3) frequency of haplogroup I, which originated in Europe, declines with increasing longitude, indicating gene flow arriving from Europe; (4) conversely, haplogroup G2 radiates towards Europe; (5) haplogroup E3b3 displays a latitudinal correlation with decreasing frequency northward; (6) haplogroup R1b3 emanates from Turkey towards Southeast Europe and Caucasia and; (7) high resolution SNP analysis provides evidence of a detectable yet weak signal (<9%) of recent paternal gene flow from Central Asia. The variety of Turkish haplotypes is witness to Turkey being both an important source and recipient of gene flow.
Full-text available
The molecular basis of more than 25 genetic diseases has been described in Ashkenazi Jewish populations. Most of these diseases are characterized by one or two major founder mutations that are present in the Ashkenazi population at elevated frequencies. One explanation for this preponderance of recessive diseases is accentuated genetic drift resulting from a series of dispersals to and within Europe, endogamy, and/or recent rapid population growth. However, a clear picture of the manner in which neutral genetic variation has been affected by such a demographic history has not yet emerged. We have examined a set of 32 binary markers (single nucleotide polymorphisms; SNPs) and 10 microsatellites on the non-recombining portion of the Y chromosome (NRY) to investigate the ways in which patterns of variation differ between Ashkenazi Jewish and their non-Jewish host populations in Europe. This set of SNPs defines a total of 20 NRY haplogroups in these populations, at least four of which are likely to have been part of the ancestral Ashkenazi gene pool in the Near East, and at least three of which may have introgressed to some degree into Ashkenazi populations after their dispersal to Europe. It is striking that whereas Ashkenazi populations are genetically more diverse at both the SNP and STR level compared with their European non-Jewish counterparts, they have greatly reduced within-haplogroup STR variability, especially in those founder haplogroups that migrated from the Near East. This contrasting pattern of diversity in Ashkenazi populations is evidence for a reduction in male effective population size, possibly resulting from a series of founder events and high rates of endogamy within Europe. This reduced effective population size may explain the high incidence of founder disease mutations despite overall high levels of NRY diversity.
Although considerable cultural impact on social hierarchy and language in South Asia is attributable to the arrival of nomadic Central Asian pastoralists, genetic data (mitochondrial and Y chromosomal) have yielded dramatically conflicting inferences on the genetic origins of tribes and castes of South Asia. We sought to resolve this conflict, using high-resolution data on 69 informative Y-chromosome binary markers and 10 microsatellite markers from a large set of geographically, socially, and linguistically representative ethnic groups of South Asia. We found that the influence of Central Asia on the pre-existing gene pool was minor. The ages of accumulated microsatellite variation in the majority of Indian haplogroups exceed 10,000-15,000 years, which attests to the antiquity of regional differentiation. Therefore, our data do not support models that invoke a pronounced recent genetic input from Central Asia to explain the observed genetic variation in South Asia. R1a1 and R2 haplogroups indicate demographic complexity that is inconsistent with a recent single history. Associated microsatellite analyses of the high-frequency R1a1 haplogroup chromosomes indicate independent recent histories of the Indus Valley and the peninsular Indian region. Our data are also more consistent with a peninsular origin of Dravidian speakers than a source with proximity to the Indus and with significant genetic input resulting from demic diffusion associated with agriculture. Our results underscore the importance of marker ascertainment for distinguishing phylogenetic terminal branches from basal nodes when attributing ancestral composition and temporality to either indigenous or exogenous sources. Our reappraisal indicates that pre-Holocene and Holocene-era--not Indo-European--expansions have shaped the distinctive South Asian Y-chromosome landscape.