-
Kirk E Lohmueller,
Anders Albrechtsen,
Yingrui Li,
Su Yeon Kim,
Thorfinn Korneliussen,
Nicolas Vinckenbosch,
Geng Tian,
Emilia Huerta-Sanchez,
Alison F Feder,
Niels Grarup,
Torben Jørgensen,
Tao Jiang,
Daniel R Witte,
Annelli Sandbæk,
Ines Hellmann,
Torsten Lauritzen,
Torben Hansen,
Oluf Pedersen,
Jun Wang,
Rasmus Nielsen
[show abstract]
[hide abstract]
ABSTRACT: A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.
PLoS Genetics 10/2011; 7(10):e1002326. · 8.69 Impact Factor
-
Morten Rasmussen,
Xiaosen Guo,
Yong Wang, Kirk E Lohmueller,
Simon Rasmussen,
Anders Albrechtsen,
Line Skotte,
Stinus Lindgreen,
Mait Metspalu,
Thibaut Jombart, [......],
Carlos D Bustamante,
Anders Krogh,
Robert A Foley,
Marta M Lahr,
Francois Balloux,
Thomas Sicheritz-Pontén,
Richard Villems,
Rasmus Nielsen,
Jun Wang,
Eske Willerslev
[show abstract]
[hide abstract]
ABSTRACT: We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa.
Science 09/2011; 334(6052):94-8. · 31.20 Impact Factor
-
Su Yeon Kim, Kirk E Lohmueller,
Anders Albrechtsen,
Yingrui Li,
Thorfinn Korneliussen,
Geng Tian,
Niels Grarup,
Tao Jiang,
Gitte Andersen,
Daniel Witte,
Torben Jorgensen,
Torben Hansen,
Oluf Pedersen,
Jun Wang,
Rasmus Nielsen
[show abstract]
[hide abstract]
ABSTRACT: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.
We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.
Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
BMC Bioinformatics 06/2011; 12:231. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations.
Genetics 12/2010; 187(3):823-35. · 4.01 Impact Factor
-
The American Journal of Human Genetics 06/2010; 86(6):978-80; author reply 980-1. · 10.60 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Despite the widespread study of genetic variation in admixed human populations, such as African-Americans, there has not been an evaluation of the effects of recent admixture on patterns of polymorphism or inferences about population demography. These issues are particularly relevant because estimates of the timing and magnitude of population growth in Africa have differed among previous studies, some of which examined African-American individuals. Here we use simulations and single-nucleotide polymorphism (SNP) data collected through direct resequencing and genotyping to investigate these issues. We find that when estimating the current population size and magnitude of recent growth in an ancestral population using the site frequency spectrum (SFS), it is possible to obtain reasonably accurate estimates of the parameters when using samples drawn from the admixed population under certain conditions. We also show that methods for demographic inference that use haplotype patterns are more sensitive to recent admixture than are methods based on the SFS. The analysis of human genetic variation data from the Yoruba people of Ibadan, Nigeria and African-Americans supports the predictions from the simulations. Our results have important implications for the evaluation of previous population genetic studies that have considered African-American individuals as a proxy for individuals from West Africa as well as for future population genetic studies of additional admixed populations.
Genetics 06/2010; 185(2):611-22. · 4.01 Impact Factor
-
Bridgett M Vonholdt,
John P Pollinger, Kirk E Lohmueller,
Eunjung Han,
Heidi G Parker,
Pascale Quignon,
Jeremiah D Degenhardt,
Adam R Boyko,
Dent A Earl,
Adam Auton, [......],
Michelle Cargill,
Paul G Jones,
Zuwei Qian,
Wei Huang,
Zhao-Li Ding,
Ya-Ping Zhang,
Carlos D Bustamante,
Elaine A Ostrander,
John Novembre,
Robert K Wayne
[show abstract]
[hide abstract]
ABSTRACT: Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.
Nature 03/2010; 464(7290):898-902. · 36.28 Impact Factor
-
Adam R Boyko,
Pascale Quignon,
Lin Li,
Jeffrey J Schoenebeck,
Jeremiah D Degenhardt, Kirk E Lohmueller,
Keyan Zhao,
Abra Brisbin,
Heidi G Parker,
Bridgett M vonHoldt, [......],
Marta Castelhano,
Dana S Mosher,
Nathan B Sutter,
Gary S Johnson,
John Novembre,
Melissa J Hubisz,
Adam Siepel,
Robert K Wayne,
Carlos D Bustamante,
Elaine A Ostrander
[show abstract]
[hide abstract]
ABSTRACT: Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (< or = 3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species.
PLoS Biology 01/2010; 8(8):e1000451. · 11.45 Impact Factor
-
Kirk E Lohmueller
[show abstract]
[hide abstract]
ABSTRACT: Graydon et al. recently reported "there is a correspondence between STR polymorphism and physical traits, suggesting that STRs may not be just genetic 'junk', but may play a role in influencing phenotypic differences between people.". We argue that this conclusion is unwarranted in light of past and present work in human population genetics. Instead, Graydon et al.'s results can be explained solely by population history.
Forensic science international. Genetics 10/2009; 4(4):273-4. · 2.42 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We propose a novel approximate-likelihood method to fit demographic models to human genomewide single-nucleotide polymorphism (SNP) data. We divide the genome into windows of constant genetic map width and then tabulate the number of distinct haplotypes and the frequency of the most common haplotype for each window. We summarize the data by the genomewide joint distribution of these two statistics-termed the HCN statistic. Coalescent simulations are used to generate the expected HCN statistic for different demographic parameters. The HCN statistic provides additional information for disentangling complex demography beyond statistics based on single-SNP frequencies. Application of our method to simulated data shows it can reliably infer parameters from growth and bottleneck models, even in the presence of recombination hotspots when properly modeled. We also examined how practical problems with genomewide data sets, such as errors in the genetic map, haplotype phase uncertainty, and SNP ascertainment bias, affect our method. Several modifications of our method served to make it robust to these problems. We have applied our method to data collected by Perlegen Sciences and find evidence for a severe population size reduction in northwestern Europe starting 32,500-47,500 years ago.
Genetics 04/2009; 182(1):217-31. · 4.01 Impact Factor
-
Kirk E Lohmueller,
Amit R Indap,
Steffen Schmidt,
Adam R Boyko,
Ryan D Hernandez,
Melissa J Hubisz,
John J Sninsky,
Thomas J White,
Shamil R Sunyaev,
Rasmus Nielsen,
Andrew G Clark,
Carlos D Bustamante
[show abstract]
[hide abstract]
ABSTRACT: Quantifying the number of deleterious mutations per diploid human genome is of crucial concern to both evolutionary and medical geneticists. Here we combine genome-wide polymorphism data from PCR-based exon resequencing, comparative genomic data across mammalian species, and protein structure predictions to estimate the number of functionally consequential single-nucleotide polymorphisms (SNPs) carried by each of 15 African American (AA) and 20 European American (EA) individuals. We find that AAs show significantly higher levels of nucleotide heterozygosity than do EAs for all categories of functional SNPs considered, including synonymous, non-synonymous, predicted 'benign', predicted 'possibly damaging' and predicted 'probably damaging' SNPs. This result is wholly consistent with previous work showing higher overall levels of nucleotide variation in African populations than in Europeans. EA individuals, in contrast, have significantly more genotypes homozygous for the derived allele at synonymous and non-synonymous SNPs and for the damaging allele at 'probably damaging' SNPs than AAs do. For SNPs segregating only in one population or the other, the proportion of non-synonymous SNPs is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%; P < 2.3 x 10(-37)). We observe a similar proportional excess of SNPs that are inferred to be 'probably damaging' (15.9% in EA; 12.1% in AA; P < 3.3 x 10(-11)). Using extensive simulations, we show that this excess proportion of segregating damaging alleles in Europeans is probably a consequence of a bottleneck that Europeans experienced at about the time of the migration out of Africa.
Nature 02/2008; 451(7181):994-7. · 36.28 Impact Factor
-
Adam B Olshen,
Bert Gold, Kirk E Lohmueller,
Jeffery P Struewing,
Jaya Satagopan,
Stefan A Stefanov,
Eleazar Eskin,
Tomas Kirchhoff,
James A Lautenberger,
Robert J Klein,
Eitan Friedman,
Larry Norton,
Nathan A Ellis,
Agnes Viale,
Catherine S Lee,
Patrick I Borgen,
Andrew G Clark,
Kenneth Offit,
Jeff Boyd
[show abstract]
[hide abstract]
ABSTRACT: Genetic isolates such as the Ashkenazi Jews (AJ) potentially offer advantages in mapping novel loci in whole genome disease association studies. To analyze patterns of genetic variation in AJ, genotypes of 101 healthy individuals were determined using the Affymetrix EAv3 500 K SNP array and compared to 60 CEPH-derived HapMap (CEU) individuals. 435,632 SNPs overlapped and met annotation criteria in the two groups.
A small but significant global difference in allele frequencies between AJ and CEU was demonstrated by a mean FST of 0.009 (P < 0.001); large regions that differed were found on chromosomes 2 and 6. Haplotype blocks inferred from pairwise linkage disequilibrium (LD) statistics (Haploview) as well as by expectation-maximization haplotype phase inference (HAP) showed a greater number of haplotype blocks in AJ compared to CEU by Haploview (50,397 vs. 44,169) or by HAP (59,269 vs. 54,457). Average haplotype blocks were smaller in AJ compared to CEU (e.g., 36.8 kb vs. 40.5 kb HAP). Analysis of global patterns of local LD decay for closely-spaced SNPs in CEU demonstrated more LD, while for SNPs further apart, LD was slightly greater in the AJ. A likelihood ratio approach showed that runs of homozygous SNPs were approximately 20% longer in AJ. A principal components analysis was sufficient to completely resolve the CEU from the AJ.
LD in the AJ versus was lower than expected by some measures and higher by others. Any putative advantage in whole genome association mapping using the AJ population will be highly dependent on regional LD structure.
BMC Genetics 01/2008; 9:14. · 2.47 Impact Factor
-
Adam R Boyko,
Scott H Williamson,
Amit R Indap,
Jeremiah D Degenhardt,
Ryan D Hernandez, Kirk E Lohmueller,
Mark D Adams,
Steffen Schmidt,
John J Sninsky,
Shamil R Sunyaev,
Thomas J White,
Rasmus Nielsen,
Andrew G Clark,
Carlos D Bustamante
[show abstract]
[hide abstract]
ABSTRACT: Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27-29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30-42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10-20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
PLoS Genetics 01/2008; 4(5):e1000083. · 8.69 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Genetic variants that contribute to risk of common disease may differ in frequency across populations more than random variants in the genome do, perhaps because they have been exposed to population-specific natural selection. To assess this hypothesis empirically, we analyzed data from two groups of single-nucleotide polymorphisms (SNPs) that have shown reproducible (n = 9) or reported (n = 39) associations with common diseases. We compared the frequency differentiation (between Europeans and Africans) of the disease-associated SNPs with that of random SNPs in the genome. These common-disease-associated SNPs are not significantly more differentiated across populations than random SNPs. Thus, for the data examined here, ethnicity will not be a good predictor of genotype at many common-disease-associated SNPs, just as it is rarely a good predictor of genotype at random SNPs in the genome.
The American Journal of Human Genetics 02/2006; 78(1):130-6. · 10.60 Impact Factor
-
Nick Patterson,
Neil Hattangadi,
Barton Lane, Kirk E Lohmueller,
David A Hafler,
Jorge R Oksenberg,
Stephen L Hauser,
Michael W Smith,
Stephen J O'Brien,
David Altshuler,
Mark J Daly,
David Reich
[show abstract]
[hide abstract]
ABSTRACT: Admixture mapping (also known as "mapping by admixture linkage disequilibrium," or MALD) has been proposed as an efficient approach to localizing disease-causing variants that differ in frequency (because of either drift or selection) between two historically separated populations. Near a disease gene, patient populations descended from the recent mixing of two or more ethnic groups should have an increased probability of inheriting the alleles derived from the ethnic group that carries more disease-susceptibility alleles. The central attraction of admixture mapping is that, since gene flow has occurred recently in modern populations (e.g., in African and Hispanic Americans in the past 20 generations), it is expected that admixture-generated linkage disequilibrium should extend for many centimorgans. High-resolution marker sets are now becoming available to test this approach, but progress will require (a). computational methods to infer ancestral origin at each point in the genome and (b). empirical characterization of the general properties of linkage disequilibrium due to admixture. Here we describe statistical methods to estimate the ancestral origin of a locus on the basis of the composite genotypes of linked markers, and we show that this approach accurately estimates states of ancestral origin along the genome. We apply this approach to show that strong admixture linkage disequilibrium extends, on average, for 17 cM in African Americans. Finally, we present power calculations under varying models of disease risk, sample size, and proportions of ancestry. Studying approximately 2500 markers in approximately 2500 patients should provide power to detect many regions contributing to common disease. A particularly important result is that the power of an admixture mapping study to detect a locus will be nearly the same for a wide range of mixture scenarios: the mixture proportion should be 10%-90% from both ancestral populations.
The American Journal of Human Genetics 06/2004; 74(5):979-1000. · 10.60 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Association studies offer a potentially powerful approach to identify genetic variants that influence susceptibility to common disease, but are plagued by the impression that they are not consistently reproducible. In principle, the inconsistency may be due to false positive studies, false negative studies or true variability in association among different populations. The critical question is whether false positives overwhelmingly explain the inconsistency. We analyzed 301 published studies covering 25 different reported associations. There was a large excess of studies replicating the first positive reports, inconsistent with the hypothesis of no true positive associations (P < 10(-14)). This excess of replications could not be reasonably explained by publication bias and was concentrated among 11 of the 25 associations. For 8 of these 11 associations, pooled analysis of follow-up studies yielded statistically significant replication of the first report, with modest estimated genetic effects. Thus, a sizable fraction (but under half) of reported associations have strong evidence of replication; for these, false negative, underpowered studies probably contribute to inconsistent replication. We conclude that there are probably many common variants in the human genome with modest but real effects on common disease risk, and that studies using large samples will convincingly identify such variants.
Nature Genetics 02/2003; 33(2):177-82. · 35.53 Impact Factor