[Show abstract][Hide abstract] ABSTRACT: Heteroplasmy, the existence of multiple mtDNA types within an individual, has been previously detected by using mostly indirect methods and focusing largely on just the hypervariable segments of the control region. Next-generation sequencing technologies should enable studies of heteroplasmy across the entire mtDNA genome at much higher resolution, because many independent reads are generated for each position. However, the higher error rate associated with these technologies must be taken into consideration to avoid false detection of heteroplasmy. We used simulations and phiX174 sequence data to design criteria for accurate detection of heteroplasmy with the Illumina Genome Analyzer platform, and we used artificial mixtures and replicate data to test and refine the criteria. We then applied these criteria to mtDNA sequence reads for 131 individuals from five Eurasian populations that had been generated via a parallel tagged approach. We identified 37 heteroplasmies at 10% frequency or higher at 34 sites in 32 individuals. The mutational spectrum does not differ between heteroplasmic mutations and polymorphisms in the same individuals, but the relative mutation rate at heteroplasmic mutations is significantly higher than that estimated for all mutable sites in the human mtDNA genome. Moreover, there is also a significant excess of nonsynonymous mutations observed among heteroplasmies, compared to polymorphism data from the same individuals. Both mutation-drift and negative selection influence the fate of heteroplasmies to determine the polymorphism spectrum in humans. With appropriate criteria for avoiding false positives due to sequencing errors, next-generation technologies can provide novel insights into genome-wide aspects of mtDNA heteroplasmy.
The American Journal of Human Genetics 08/2010; 87(2):237-49. · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The origins of the nearly one billion people inhabiting the Indian subcontinent and following the customs of the Hindu caste system are controversial: are they largely derived from Indian local populations (i.e. tribal groups) or from recent immigrants to India? Archaeological and linguistic evidence support the latter hypothesis, whereas recent genetic data seem to favor the former hypothesis. Here, we analyze the most extensive dataset of Indian caste and tribal Y chromosomes to date. We find that caste and tribal groups differ significantly in their haplogroup frequency distributions; caste groups are homogeneous for Y chromosome variation and more closely related to each other and to central Asian groups than to Indian tribal or any other Eurasian groups. We conclude that paternal lineages of Indian caste groups are primarily descended from Indo-European speakers who migrated from central Asia approximately 3,500 years ago. Conversely, paternal lineages of tribal groups are predominantly derived from the original Indian gene pool. We also provide evidence for bidirectional male gene flow between caste and tribal groups. In comparison, caste and tribal groups are homogeneous with respect to mitochondrial DNA variation, which may reflect the sociocultural characteristics of the Indian caste society.
Current Biology 03/2004; 14(3):231-5. · 9.49 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Previous studies have reported that about 85% of human diversity at Short Tandem Repeat (STR) and Restriction Fragment Length Polymorphism (RFLP) autosomal loci is due to differences between individuals of the same population, whereas differences among continental groups account for only 10% of the overall genetic variance. These findings conflict with popular notions of distinct and relatively homogeneous human races, and may also call into question the apparent usefulness of ethnic classification in, for example, medical diagnostics. Here, we present new data on 21 Alu insertions in 32 populations. We analyze these data along with three other large, globally dispersed data sets consisting of apparently neutral biallelic nuclear markers, as well as with a beta-globin data set possibly subject to selection. We confirm the previous results for the autosomal data, and find a higher diversity among continents for Y-chromosome loci. We also extend the analyses to address two questions: (1) whether differences between continental groups, although small, are nevertheless large enough to confidently assign individuals to their continent on the basis of their genotypes; (2) whether the observed genotypes naturally cluster into continental or population groups when the sample source location is ignored. Using a range of statistical methods, we show that classification errors are at best around 30% for autosomal biallelic polymorphisms and 27% for the Y chromosome. Two data sets suggest the existence of three and four major groups of genotypes worldwide, respectively, and the two groupings are inconsistent. These results suggest that, at random biallelic loci, there is little evidence, if any, of a clear subdivision of humans into biologically defined groups.
Genome Research 05/2002; 12(4):602-12. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The spread of agriculture that started in the Near East about 10 000 years ago caused a dramatic change in the European archaeological record. It is still unclear if that change was caused mostly by movement of people or by cultural transformations. In particular, there is disagreement on what proportion of the current European gene pool is derived either from the pre-agricultural, paleolithic and mesolithic people, or from neolithic farmers immigrating from the south-east. To begin to characterise the mtDNA gene pool of prehistoric Europe we examined five human remains from the Eastern Italian Alps, dated between 14 000 and 3000 years ago. Three of them yielded sufficient amount of mtDNA for analysis. DNA extracts were prepared in two independent laboratories, and PCR products from the first hypervariable segment of the mtDNA control region were cloned and sequenced. Together with the 5200 year old 'ice man', these DNA sequences show that European mtDNA diversity was already high at the beginning of the neolithic period. All the neolithic sequences have been observed in contemporary Europeans, suggesting genealogical continuity between the neolithic and present-day European mtDNA gene pool. The mtDNA sequence from a 14 000 year-old specimen was not observed in any contemporary Europeans, raising the possibility of a lack of continuity between the mesolithic and present-day European gene pools.
European Journal of HumanGenetics 10/2000; 8(9):669-77. · 4.32 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Two hypervariable Y-specific markers, the YCAII and DYS19 STRs, and the more stable Y Alu Polymorphism (YAP) have been analysed in about 1400 individuals of 21 different populations, mainly from Europe but also from the Middle East, Africa and Asia. On the basis of the frequency distributions of these three Y-markers we compare, using different statistical analyses, their power in detecting population genetic structure and in distinguishing closely related groups. The pattern of populations' genetic affinities inferred from the three markers considered altogether suggests a strong genetic structure that, with a few exceptions, broadly corresponds to the linguistic relatedness and/or geographic location of the sampled populations.
Annals of Human Genetics 04/1999; 63(Pt 2):153-66. · 2.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genetic evidence is consistent with the view that the Indo-European languages were propagated in Europe by the diffusion of early farmers. The existence of phylogenetic relationships between European populations speaking other languages has been proposed on linguistic and archaeological grounds, and is here tested by analyzing allele frequencies at ten polymorphic protein and blood group loci. Genetic distances between speakers of Basque and Caucasian languages are compared with those between controls, i.e. contiguous populations speaking Indo-European and Altaic. Although some statistical tests show an excess of genetic similarity between Basque and South Caucasian speakers, most results do not support their common origin. If the Basques and the Caucasian-speaking populations share common ancestors, recent evolutionary phenomena must have caused divergence between them, so that their gene frequencies do not appear more similar now than those of random pairs of populations separated by the same geographic distance.
European Journal of HumanGenetics 02/1995; 3(4):256-63. · 4.32 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Extensive genetic diversity exists in the populations of the Caucasus. Various hypotheses on its origin and evolution were tested by comparing genetic, geographic, and linguistic distances. Seventeen polymorphic loci and 107 localities were considered, and Mantel tests of matrix association were carried out. Genetic differences correlate more with linguistic than with geographic distances; but when populations are grouped by the language spoken, this correlation loses significance, whereas genetic and geographic distances between groups appear significantly associated. Hypotheses that classify North and South Caucasian languages into distinct families or that treat all North Caucasian languages as independent linguistic entities fail to account for genetic variation better than simpler models. We interpret these results as evidence for an evolutionary process in which linguistic and genetic divergence has resulted from population subdivision and from processes of elite dominance, that is, language replacement not associated with major migratory movements.
Human Biology 11/1994; 66(5):843-64. · 1.52 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We report genetic variation in the Caucasus, a region showing extreme linguistic differentiation. Spatial autocorrelation analysis of 31 alleles in 793 samples and maps of interpolated allele frequencies show significant geographic structure, but the patterns are clinical for only a few alleles. Many gene frequency distributions are patchy, most likely because of population subdivision and isolation by distance. Genetic boundaries tend to occur in different zones for the different alleles; significant overlap is observed, with boundaries separating different ethnic and linguistic groups. Conversely, the major geographic barriers, including the Caucasus Mountains, seem to have had little influence on the patterns and degrees of genetic differentiation. As a consequence, the genetic structure of Caucasus populations basically reflects restricted gene flow resulting from linguistic or ethnic subdivision. Genetic diversity does not provide evidence for a wavelike population expansion, such as the one associated with demic diffusion of agriculture in most of Eurasia.
Human Biology 09/1994; 66(4):639-68. · 1.52 Impact Factor