Noah A Rosenberg

Stanford University, Palo Alto, California, United States

Are you Noah A Rosenberg?

Claim your profile

Publications (121)847.75 Total impact

  • Erkan O. Buzbas, Noah A. Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: Approximate Bayesian computation (ABC) methods perform inference on model-specific parameters of mechanistically motivated parametric models when evaluating likelihoods is difficult. Central to the success of ABC methods, which have been used frequently in biology, is computationally inexpensive simulation of data sets from the parametric model of interest. However, when simulating data sets from a model is so computationally expensive that the posterior distribution of parameters cannot be adequately sampled by ABC, inference is not straightforward. We present “approximate approximate Bayesian computation” (AABC), a class of computationally fast inference methods that extends ABC to models in which simulating data is expensive. In AABC, we first simulate a number of data sets small enough to be computationally feasible to simulate from the parametric model. Conditional on these data sets, we use a statistical model that approximates the correct parametric model and enables efficient simulation of a large number of data sets. We show that under mild assumptions, the posterior distribution obtained by AABC converges to the posterior distribution obtained by ABC, as the number of data sets simulated from the parametric model and the sample size of the observed data set increase. We demonstrate the performance of AABC on a population-genetic model of natural selection, as well as on a model of the admixture history of hybrid populations. This latter example illustrates how, in population genetics, AABC is of particular utility in scenarios that rely on conceptually straightforward but potentially slow forward-in-time simulations.
    Theoretical Population Biology 09/2014; · 1.24 Impact Factor
  • Source
    Amy Goldberg, Paul Verdu, Noah A Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: Sex-biased admixture has been observed in a wide variety of admixed populations. Genetic variation in sex chromosomes and functions of quantities computed from sex chromosomes and autosomes have often been examined in order to infer patterns of sex-biased admixture, typically using statistical approaches that do not mechanistically model the complexity of a sex-specific history of admixture. Here, expanding on a model of Verdu & Rosenberg (2011) that did not include sex specificity, we develop a model that mechanistically examines sex-specific admixture histories. Under the model, multiple source populations contribute to an admixed population, potentially with their male and female contributions varying over time. In an admixed population descended from two source groups, we derive the moments of the distribution of the autosomal admixture fraction from a specific source population as a function of sex-specific introgression parameters and time. Considering admixture processes that are constant in time, we demonstrate that surprisingly, although the mean autosomal admixture fraction from a specific source population does not reveal a sex bias in the admixture history, the variance of autosomal admixture is informative about sex bias. Specifically, the long-term variance decreases as the sex bias from a contributing source population increases. This result can be viewed as analogous to the reduction in effective population size for populations with an unequal number of breeding males and females. Our approach suggests that it may be possible to use the effect of sex-biased admixture on autosomal DNA to assist with methods for inference of the history of complex sex-biased admixture processes.
    Genetics 09/2014; · 4.39 Impact Factor
  • C V Than, N A Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: We derive formulas for mean deep coalescence cost, for either a fixed species tree or a fixed gene tree, under probability distributions that satisfy the exchangeability property. We then apply the formulas to study mean deep coalescence cost under two commonly used exchangeable models—the uniform and Yule models. We find that mean deep coalescence cost, for either a fixed species tree or a fixed gene tree, tends to be larger for unbalanced trees than for balanced trees. These results provide a better understanding of the deep coalescence cost, as well as allow for the development of new species tree inference criteria.
    Discrete Applied Mathematics 09/2014; · 0.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The initial contact of European populations with indigenous populations of the Americas produced diverse admixture processes across North, Central, and South America. Recent studies have examined the genetic structure of indigenous populations of Latin America and the Caribbean and their admixed descendants, reporting on the genomic impact of the history of admixture with colonizing populations of European and African ancestry. However, relatively little genomic research has been conducted on admixture in indigenous North American populations. In this study, we analyze genomic data at 475,109 single-nucleotide polymorphisms sampled in indigenous peoples of the Pacific Northwest in British Columbia and Southeast Alaska, populations with a well-documented history of contact with European and Asian traders, fishermen, and contract laborers. We find that the indigenous populations of the Pacific Northwest have higher gene diversity than Latin American indigenous populations. Among the Pacific Northwest populations, interior groups provide more evidence for East Asian admixture, whereas coastal groups have higher levels of European admixture. In contrast with many Latin American indigenous populations, the variance of admixture is high in each of the Pacific Northwest indigenous populations, as expected for recent and ongoing admixture processes. The results reveal some similarities but notable differences between admixture patterns in the Pacific Northwest and those in Latin America, contributing to a more detailed understanding of the genomic consequences of European colonization events throughout the Americas.
    PLoS Genetics 08/2014; 10(8):e1004530. · 8.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse setsof taxa, species trees are frequently being inferred from multilocus data. However, the behavior ofmany methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in whichthey fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for speciestree inference using three alternative outgroups (72 strategies total). The data consist of 120individuals sampled in eight ingroup species from subsection Strobus and three outgroup speciesfrom subsection Gerardianae, spanning ~47 kilobases of sequence at 121 loci. Each "strategy"for inferring species trees consists of three features: a species tree construction method, a gene treeinference method, and a choice of outgroup. We use multivariate analysis techniques such as principalcomponents analysis and hierarchical clustering to identify tree characteristics that are robustlyobserved across strategies, as well as to identify groups of strategies that produce trees with similarfeatures. We find that strategies that construct species trees using only topological information clustertogether and that strategies that use additional non-topological information (e.g., branch lengths) alsocluster together. Strategies that utilize more than one individual within a species to infer gene treestend to produce estimates of species trees that contain clades present in trees estimated by otherstrategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tendto produce species tree estimates that contain clades that are not present in trees estimated by theConcatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced thanthose inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basisfor interpreting differences in species tree estimates obtained via different approaches that have atwo-stage structure in common, one step for gene tree estimation and a second step for species treeestimation. The methods explored here employ a number of distinct features of the data, and ouranalysis suggests that recovery of the same results from multiple methods that tend to differ in theirpatterns of inference can be a valuable tool for obtaining reliable estimates.
    BMC Evolutionary Biology 03/2014; 14(1):67. · 3.29 Impact Factor
  • Source
    Ethan M Jewett, Noah A Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: Under the coalescent model, the random number nt of lineages ancestral to a sample is nearly deterministic as a function of time when nt is moderate to large in value, and it is well approximated by its expectation E[nt]. In turn, this expectation is well approximated by simple deterministic functions that are easy to compute. Such deterministic functions have been applied to estimate allele age, effective population size, and genetic diversity, and they have been used to study properties of models of infectious disease dynamics. Although a number of simple approximations of E[nt] have been derived and applied to problems of population-genetic inference, the theoretical accuracy of the formulas and the inferences obtained using these approximations is not known, and the range of problems to which they can be applied is not well understood. Here, we demonstrate general procedures by which the approximation nt≈E[nt] can be used to reduce the computational complexity of coalescent formulas, and we show that the resulting approximations converge to their true values under simple assumptions. Such approximations provide alternatives to exact formulas that are computationally intractable or numerically unstable when the number of sampled lineages is moderate or large. We also extend an existing class of approximations of E[nt] to the case of multiple populations of time-varying size with migration among them. Our results facilitate the use of the deterministic approximation nt≈E[nt] for deriving functionally simple, computationally efficient, and numerically stable approximations of coalescent formulas under complicated demographic scenarios.
    Theoretical Population Biology 01/2014; · 1.24 Impact Factor
  • Trevor J Pemberton, Noah A Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: Culturally driven marital practices provide a key instance of an interaction between social and genetic processes in shaping patterns of human genetic variation, producing, for example, increased identity by descent through consanguineous marriage. A commonly used measure to quantify identity by descent in an individual is the inbreeding coefficient, a quantity that reflects not only consanguinity, but also other aspects of kinship in the population to which the individual belongs. Here, in populations worldwide, we examine the relationship between genomic estimates of the inbreeding coefficient and population patterns in genetic variation.
    Human Heredity 01/2014; 77(1-4):37-48. · 1.57 Impact Factor
  • Michael D. Edge, Noah A. Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: FSTFST is one of the most frequently-used indices of genetic differentiation among groups. Though FSTFST takes values between 0 and 1, authors going back to Wright have noted that under many circumstances, FSTFST is constrained to be less than 1. Recently, we showed that at a genetic locus with an unspecified number of alleles, FSTFST for two subpopulations is strictly bounded from above by functions of both the frequency of the most frequent allele (MM) and the homozygosity of the total population (HTHT). In the two-subpopulation case, FSTFST can equal one only when the frequency of the most frequent allele and the total homozygosity are 1/21/2. Here, we extend this work by deriving strict bounds on FSTFST for two subpopulations when the number of alleles at the locus is specified to be II. We show that restricting to II alleles produces the same upper bound on FSTFST over much of the allowable domain for MM and HTHT, and we derive more restrictive bounds in the windows M∈[1/I,1/(I−1))M∈[1/I,1/(I−1)) and HT∈[1/I,I/(I2−1))HT∈[1/I,I/(I2−1)). These results extend our understanding of the behavior of FSTFST in relation to other population-genetic statistics.
    Theoretical Population Biology 01/2014; · 1.24 Impact Factor
  • Noah A Rosenberg
    Theoretical Population Biology 12/2013; · 1.24 Impact Factor
  • Noah A Rosenberg, Steven P Weitzman
    [Show abstract] [Hide abstract]
    ABSTRACT: The year 2014 marks the 35th anniversary of two noteworthy papers, one in this journal and the other in the American Journal of Human Genetics, posing the same famous question: are the different Jewish populations from around Europe, the Middle East, and North Africa more genetically similar to each other, or are they more similar to the local non-Jewish populations in the regions where they were historically located? Both studies gathered blood group and protein variation data from a variety of Jewish and non-Jewish populations, compiling significant "classical marker" data sets commensurate with the standard for human population genetic studies at the time. Writing in the American Journal of Human Genetics, Karlin et al. (1979) reported, "We found the Ashkenazi, Sephardi, and Iraqi Jewish populations to be consistently close in genetic constitution and distant from all the other populations," concluding that, in fact, the Jewish populations were generally more genetically similar to each other. In Human Biology, Carmelli and Cavalli-Sforza (1979) wrote, "A wide scatter of the Jews was observed among clusters of non-Jews," finding that, on the contrary, the Jewish groups were largely more similar to the local non-Jewish populations. While these studies used some of the best statistics and data available at the time, they highlight the dramatic changes that have taken place in human population genetics research over the last 35 years. The field has proceeded through a succession of new types of genetic markers, the size of classical data sets has been spectacularly superseded, and the effort to understand new and larger collections of markers has provided many novel methods for the statistical toolbox of population genetics. Further, it has become clear that levels of similarity in human populations are sufficient that the resolution of population relationships among closely related groups often requires both an amount of data and a computational capacity that would have been unimaginable to researchers working in 1979. What has not changed, however, is the interest in questions about the genetics of Jewish populations—in fact, it has intensified. From the viewpoint of population genetics, the history of the various Jewish populations provides a scenario capable of inspiring and testing new population genetic methods, a rare case in which multiple groups with a component of shared identity and descent have lived over a large geographic range for a long period of time in a region of the world with a deep written record. From the viewpoint of scholarly fields that treat Jewish culture and history as the object of investigation, the use of genetics as an approach for understanding Jewish populations and their history taps into an intrinsic Jewish cultural interest in origins and migrations, a recognition of nonidentical but overlapping senses of Jewish group membership—from cultural to religious to genealogical—and the centrality to Jewish culture of the inheritance of Jewishness within families, as reflected in the title phrase, "from generation to generation." Seeking to advance and understand trends in the genetics of Jewish populations, this special issue focuses on Jewish population genetics, setting new developments in relation not only to past population genetic studies but also in the broader context of Jewish studies scholarship. The special issue builds upon a course of the same name that we held jointly in the biology and Jewish studies programs at Stanford University in the autumn of 2012, featuring the issue’s contributors as guest lecturers. Human population genetics is, in part, a form of historical endeavor, potentially illuminating the effects of social practices such as endogamy and conversion, the history of population relationships, and the magnitude, direction, and timing of migration events. At the same time, the field can be viewed as historically situated, with its underlying assumptions, its expression in language, and its cultural reverberations and social implications subject to research in their own right. As a collection of articles spanning multiple forms of inquiry, this special issue aims to both present and contextualize current research, discussing its cultural environment and the challenges that lie ahead. Two research reports in this issue present modern studies in Jewish population genetics. Peter Oefner and colleagues investigate genetic variation in the Samaritans, a small Middle Eastern population...
    Human Biology 12/2013; 85(6):817-824. · 1.52 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Samaritans are a group of some 750 indigenous Middle Eastern people, about half of whom live in Holon, a suburb of Tel Aviv, and the other half near Nablus. The Samaritan population is believed to have numbered more than a million in late Roman times but less than 150 in 1917. The ancestry of the Samaritans has been subject to controversy from late Biblical times to the present. In this study, liquid chromatography/electrospray ionization/quadrupole ion trap mass spectrometry was used to allelotype 13 Y-chromosomal and 15 autosomal microsatellites in a sample of 12 Samaritans chosen to have as low a level of relationship as possible, and 461 Jews and non-Jews. Estimation of genetic distances between the Samaritans and seven Jewish and three non-Jewish populations from Israel, as well as populations from Africa, Pakistan, Turkey, and Europe, revealed that the Samaritans were closely related to Cohanim. This result supports the position of the Samaritans that they are descendants from the tribes of Israel dating to before the Assyrian exile in 722-720 BCE. In concordance with previously published single-nucleotide polymorphism haplotypes, each Samaritan family, with the exception of the Samaritan Cohen lineage, was observed to carry a distinctive Y-chromosome short tandem repeat haplotype that was not more than one mutation removed from the six-marker Cohen modal haplotype.
    Human Biology 12/2013; 85(6):825-858. · 1.52 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The origin and history of the Ashkenazi Jewish population have long been of great interest, and advances in high-throughput genetic analysis have recently provided a new approach for investigating these topics. We and others have argued on the basis of genome-wide data that the Ashkenazi Jewish population derives its ancestry from a combination of sources tracing to both Europe and the Middle East. It has been claimed, however, through a reanalysis of some of our data, that a large part of the ancestry of the Ashkenazi population originates with the Khazars, a Turkic-speaking group that lived to the north of the Caucasus region ~1,000 years ago. Because the Khazar population has left no obvious modern descendants that could enable a clear test for a contribution to Ashkenazi Jewish ancestry, the Khazar hypothesis has been difficult to examine using genetics. Furthermore, because only limited genetic data have been available from the Caucasus region, and because these data have been concentrated in populations that are genetically close to populations from the Middle East, the attribution of any signal of Ashkenazi-Caucasus genetic similarity to Khazar ancestry rather than shared ancestral Middle Eastern ancestry has been problematic. Here, through integration of genotypes from newly collected samples with data from several of our past studies, we have assembled the largest data set available to date for assessment of Ashkenazi Jewish genetic origins. This data set contains genome-wide single-nucleotide polymorphisms in 1,774 samples from 106 Jewish and non-Jewish populations that span the possible regions of potential Ashkenazi ancestry: Europe, the Middle East, and the region historically associated with the Khazar Khaganate. The data set includes 261 samples from 15 populations from the Caucasus region and the region directly to its north, samples that have not previously been included alongside Ashkenazi Jewish samples in genomic studies. Employing a variety of standard techniques for the analysis of population-genetic structure, we found that Ashkenazi Jews share the greatest genetic ancestry with other Jewish populations and, among non-Jewish populations, with groups from Europe and the Middle East. No particular similarity of Ashkenazi Jews to populations from the Caucasus is evident, particularly populations that most closely represent the Khazar region. Thus, analysis of Ashkenazi Jews together with a large sample from the region of the Khazar Khaganate corroborates the earlier results that Ashkenazi Jews derive their ancestry primarily from populations of the Middle East and Europe, that they possess considerable shared ancestry with other Jewish populations, and that there is no indication of a significant genetic contribution either from within or from north of the Caucasus region.
    Human Biology 12/2013; 85(6):859-900. · 1.52 Impact Factor
  • Genetics in medicine: official journal of the American College of Medical Genetics 09/2013; 15(9):753-4. · 3.92 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample, and then to impute the rest of the study sample using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the "most diverse reference panel," defined as the subset with the maximal "phylogenetic diversity," thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can considerably improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different maker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data.
    Genetics 08/2013; · 4.39 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Exome sequencing offers the potential to study the population-genomic variables that underlie patterns of deleterious variation. Runs of homozygosity (ROH) are long stretches of consecutive homozygous genotypes probably reflecting segments shared identically by descent as the result of processes such as consanguinity, population size reduction, and natural selection. The relationship between ROH and patterns of predicted deleterious variation can provide insight into the way in which these processes contribute to the maintenance of deleterious variants. Here, we use exome sequencing to examine ROH in relation to the distribution of deleterious variation in 27 individuals of varying levels of apparent inbreeding from 6 human populations. A significantly greater fraction of all genome-wide predicted damaging homozygotes fall in ROH than would be expected from the corresponding fraction of nondamaging homozygotes in ROH (p < 0.001). This pattern is strongest for long ROH (p < 0.05). ROH, and especially long ROH, harbor disproportionately more deleterious homozygotes than would be expected on the basis of the total ROH coverage of the genome and the genomic distribution of nondamaging homozygotes. The results accord with a hypothesis that recent inbreeding, which generates long ROH, enables rare deleterious variants to exist in homozygous form. Thus, just as inbreeding can elevate the occurrence of rare recessive diseases that represent homozygotes for strongly deleterious mutations, inbreeding magnifies the occurrence of mildly deleterious variants as well.
    The American Journal of Human Genetics 06/2013; · 11.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5,795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.
    G3-Genes Genomes Genetics 03/2013; · 1.79 Impact Factor
  • Source
    Erkan O. Buzbas, Noah A. Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: Approximate Bayesian computation (ABC) methods perform inference on model-specific parameters of mechanistically motivated parametric statistical models when evaluating likelihoods is difficult. Central to the success of ABC methods is computationally inexpensive simulation of data sets from the parametric model of interest. However, when simulating data sets from a model is so computationally expensive that the posterior distribution of parameters cannot be adequately sampled by ABC, inference is not straightforward. We present approximate approximate Bayesian computation" (AABC), a class of methods that extends simulation-based inference by ABC to models in which simulating data is expensive. In AABC, we first simulate a limited number of data sets that is computationally feasible to simulate from the parametric model. We use these data sets as fixed background information to inform a non-mechanistic statistical model that approximates the correct parametric model and enables efficient simulation of a large number of data sets by Bayesian resampling methods. We show that under mild assumptions, the posterior distribution obtained by AABC converges to the posterior distribution obtained by ABC, as the number of data sets simulated from the parametric model and the sample size of the observed data set increase simultaneously. We illustrate the performance of AABC on a population-genetic model of natural selection, as well as on a model of the admixture history of hybrid populations.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Neighbor-joining is one of the most widely used methods for constructing evolutionary trees. This approach from phylogenetics is often employed in population genetics, where distance matrices obtained from allele frequencies are used to produce a representation of population relationships in the form of a tree. In phylogenetics, the utility of neighbor-joining derives partly from a result that for a class of distance matrices including those that are additive or tree-like-generated by summing weights over the edges connecting pairs of taxa in a tree to obtain pairwise distances-application of neighbor-joining recovers exactly the underlying tree. For populations within a species, however, migration and admixture can produce distance matrices that reflect more complex processes than those obtained from the bifurcating trees typical in the multispecies context. Admixed populations-populations descended from recent mixture of groups that have long been separated-have been observed to be located centrally in inferred neighbor-joining trees, with short external branches incident to the path connecting their source populations. Here, using a simple model, we explore mathematically the behavior of an admixed population under neighbor-joining. We show that with an additive distance matrix, a population admixed among two source populations necessarily lies on the path between the sources. Relaxing the additivity requirement, we examine the smallest nontrivial case-four populations, one of which is admixed between two of the other three-showing that the two source populations never merge with each other before one of them merges with the admixed population. Furthermore, the distance on the constructed tree between the admixed population and either source population is always smaller than the distance between the source populations, and the external branch for the admixed population is always incident to the path connecting the sources. We define three properties that hold for four taxa and that we hypothesize are satisfied under more general conditions: antecedence of clustering, intermediacy of distances, and intermediacy of path lengths. Our findings can inform interpretations of neighbor-joining trees with admixed groups, and they provide an explanation for patterns observed in trees of human populations.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 01/2013;
  • Noah A Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: A coalescent history is an assignment of branches of a gene tree to branches of a species tree on which coalescences in the gene tree occur. The number of coalescent histories for a pair consisting of a labeled gene tree topology and a labeled species tree topology is important in gene tree probability computations, and more generally, in studying evolutionary possibilities for gene trees on species trees. Defining the Tr-caterpillar-like family as a sequence of n-taxon trees constructed by replacing the r-taxon subtree of n-taxon caterpillars by a specific r-taxon labeled topology Tr, we examine the number of coalescent histories for caterpillar-like families with matching gene tree and species tree labeled topologies. For each Tr with size r≤8, we compute the number of coalescent histories for n-taxon trees in the Tr-caterpillar-like family. Next, as n→∞, we find that the limiting ratio of the numbers of coalescent histories for the Tr family and caterpillars themselves is correlated with the number of labeled histories for Tr. The results support a view that large numbers of coalescent histories occur when a tree has both a relatively balanced subtree and a high tree depth, contributing to deeper understanding of the combinatorics of gene trees and species trees.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 01/2013; 10(5):1253-62. · 2.25 Impact Factor
  • Source
    Michael D Edge, Prakash Gorroochurn, Noah A Rosenberg
    [Show abstract] [Hide abstract]
    ABSTRACT: Association mapping can be viewed as an application of population genetics and evolutionary biology to the problem of identifying genes causally connected to phenotypes. However, some population-genetic principles important to the design and analysis of association studies have not been widely understood or have even been generally misunderstood. Some of these principles underlie techniques that can aid in the discovery of genetic variants that influence phenotypes ('windfalls'), whereas others can interfere with study design or interpretation of results ('pitfalls'). Here, considering examples involving genetic variant discovery, linkage disequilibrium, power to detect associations, population stratification and genotype imputation, we address misunderstandings in the application of population genetics to association studies, and we illuminate how some surprising results in association contexts can be easily explained when considered from evolutionary and population-genetic perspectives. Through our examples, we argue that population-genetic thinking-which takes a theoretical view of the evolutionary forces that guide the emergence and propagation of genetic variants-substantially informs the design and interpretation of genetic association studies. In particular, population-genetic thinking sheds light on genetic confounding, on the relationships between association signals of typed markers and causal variants, and on the advantages and disadvantages of particular strategies for measuring genetic variation in association studies.
    Evolution, medicine, and public health. 01/2013; 2013(1):254-272.

Publication Stats

10k Citations
847.75 Total Impact Points


  • 1999–2014
    • Stanford University
      • Department of Biology
      Palo Alto, California, United States
    • University of Oxford
      • Department of Statistics
      Oxford, ENG, United Kingdom
  • 2013
    • University of Manitoba
      Winnipeg, Manitoba, Canada
  • 2009–2013
    • Tel Aviv University
      • Department of Zoology
      Tel Aviv, Tel Aviv, Israel
    • Masaryk University
      Brünn, South Moravian, Czech Republic
    • University of California, Davis
      • Department of Anthropology
      Davis, CA, United States
  • 2012
    • Uppsala University
      Uppsala, Uppsala, Sweden
    • University of California, Berkeley
      • Department of Integrative Biology
      Berkeley, CA, United States
  • 2005–2012
    • University of Michigan
      • • Department of Biostatistics
      • • Department of Computational Medicine and Bioinformatics
      • • Department of Human Genetics
      Ann Arbor, MI, United States
    • University of Texas Health Science Center at Houston
      • Human Genetics Center
      Houston, TX, United States
  • 2011
    • Brown University
      • Department of Ecology and Evolutionary Biology
      Providence, RI, United States
    • Statens Serum Institut
      København, Capital Region, Denmark
    • Johns Hopkins University
      • Department of Biostatistics
      Baltimore, MD, United States
  • 2008
    • Grenoble Institute of Technology
      Grenoble, Rhône-Alpes, France
  • 2002–2008
    • University of Southern California
      • • Institute for Genetic Medicine
      • • Department of Biological Sciences
      • • Division of Molecular and Computational Biology
      Los Angeles, CA, United States
  • 2007
    • Concordia University–Ann Arbor
      Ann Arbor, Michigan, United States
  • 2006
    • Harvard University
      • Department of Biostatistics
      Boston, MA, United States
  • 2003
    • Vavilov Institute of General Genetics
      Moskva, Moscow, Russia