Noah A Rosenberg

Stanford University, Palo Alto, California, United States

Are you Noah A Rosenberg?

Claim your profile

Publications (112)806.86 Total impact

  • [show abstract] [hide abstract]
    ABSTRACT: As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse setsof taxa, species trees are frequently being inferred from multilocus data. However, the behavior ofmany methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in whichthey fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for speciestree inference using three alternative outgroups (72 strategies total). The data consist of 120individuals sampled in eight ingroup species from subsection Strobus and three outgroup speciesfrom subsection Gerardianae, spanning ~47 kilobases of sequence at 121 loci. Each "strategy"for inferring species trees consists of three features: a species tree construction method, a gene treeinference method, and a choice of outgroup. We use multivariate analysis techniques such as principalcomponents analysis and hierarchical clustering to identify tree characteristics that are robustlyobserved across strategies, as well as to identify groups of strategies that produce trees with similarfeatures. We find that strategies that construct species trees using only topological information clustertogether and that strategies that use additional non-topological information (e.g., branch lengths) alsocluster together. Strategies that utilize more than one individual within a species to infer gene treestend to produce estimates of species trees that contain clades present in trees estimated by otherstrategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tendto produce species tree estimates that contain clades that are not present in trees estimated by theConcatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced thanthose inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basisfor interpreting differences in species tree estimates obtained via different approaches that have atwo-stage structure in common, one step for gene tree estimation and a second step for species treeestimation. The methods explored here employ a number of distinct features of the data, and ouranalysis suggests that recovery of the same results from multiple methods that tend to differ in theirpatterns of inference can be a valuable tool for obtaining reliable estimates.
    BMC Evolutionary Biology 03/2014; 14(1):67. · 3.29 Impact Factor
  • Ethan M Jewett, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Under the coalescent model, the random number nt of lineages ancestral to a sample is nearly deterministic as a function of time when nt is moderate to large in value, and it is well approximated by its expectation E[nt]. In turn, this expectation is well approximated by simple deterministic functions that are easy to compute. Such deterministic functions have been applied to estimate allele age, effective population size, and genetic diversity, and they have been used to study properties of models of infectious disease dynamics. Although a number of simple approximations of E[nt] have been derived and applied to problems of population-genetic inference, the theoretical accuracy of the formulas and the inferences obtained using these approximations is not known, and the range of problems to which they can be applied is not well understood. Here, we demonstrate general procedures by which the approximation nt≈E[nt] can be used to reduce the computational complexity of coalescent formulas, and we show that the resulting approximations converge to their true values under simple assumptions. Such approximations provide alternatives to exact formulas that are computationally intractable or numerically unstable when the number of sampled lineages is moderate or large. We also extend an existing class of approximations of E[nt] to the case of multiple populations of time-varying size with migration among them. Our results facilitate the use of the deterministic approximation nt≈E[nt] for deriving functionally simple, computationally efficient, and numerically stable approximations of coalescent formulas under complicated demographic scenarios.
    Theoretical Population Biology 01/2014; · 1.24 Impact Factor
  • Source
    C V Than, N A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: We derive formulas for mean deep coalescence cost, for either a fixed species tree or a fixed gene tree, under probability distributions that satisfy the exchangeability property. We then apply the formulas to study mean deep coalescence cost under two commonly used exchangeable models—the uniform and Yule models. We find that mean deep coalescence cost, for either a fixed species tree or a fixed gene tree, tends to be larger for unbalanced trees than for balanced trees. These results provide a better understanding of the deep coalescence cost, as well as allow for the development of new species tree inference criteria.
    Discrete Applied Mathematics 01/2014; · 0.72 Impact Factor
  • Noah A Rosenberg
    Theoretical Population Biology 12/2013; · 1.24 Impact Factor
  • Genetics in medicine: official journal of the American College of Medical Genetics 09/2013; 15(9):753-4. · 3.92 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample, and then to impute the rest of the study sample using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the "most diverse reference panel," defined as the subset with the maximal "phylogenetic diversity," thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can considerably improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different maker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data.
    Genetics 08/2013; · 4.39 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Exome sequencing offers the potential to study the population-genomic variables that underlie patterns of deleterious variation. Runs of homozygosity (ROH) are long stretches of consecutive homozygous genotypes probably reflecting segments shared identically by descent as the result of processes such as consanguinity, population size reduction, and natural selection. The relationship between ROH and patterns of predicted deleterious variation can provide insight into the way in which these processes contribute to the maintenance of deleterious variants. Here, we use exome sequencing to examine ROH in relation to the distribution of deleterious variation in 27 individuals of varying levels of apparent inbreeding from 6 human populations. A significantly greater fraction of all genome-wide predicted damaging homozygotes fall in ROH than would be expected from the corresponding fraction of nondamaging homozygotes in ROH (p < 0.001). This pattern is strongest for long ROH (p < 0.05). ROH, and especially long ROH, harbor disproportionately more deleterious homozygotes than would be expected on the basis of the total ROH coverage of the genome and the genomic distribution of nondamaging homozygotes. The results accord with a hypothesis that recent inbreeding, which generates long ROH, enables rare deleterious variants to exist in homozygous form. Thus, just as inbreeding can elevate the occurrence of rare recessive diseases that represent homozygotes for strongly deleterious mutations, inbreeding magnifies the occurrence of mildly deleterious variants as well.
    The American Journal of Human Genetics 06/2013; · 11.20 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5,795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.
    G3-Genes Genomes Genetics 03/2013; · 1.79 Impact Factor
  • Source
    Erkan O. Buzbas, Noah A. Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Approximate Bayesian computation (ABC) methods perform inference on model-specific parameters of mechanistically motivated parametric statistical models when evaluating likelihoods is difficult. Central to the success of ABC methods is computationally inexpensive simulation of data sets from the parametric model of interest. However, when simulating data sets from a model is so computationally expensive that the posterior distribution of parameters cannot be adequately sampled by ABC, inference is not straightforward. We present approximate approximate Bayesian computation" (AABC), a class of methods that extends simulation-based inference by ABC to models in which simulating data is expensive. In AABC, we first simulate a limited number of data sets that is computationally feasible to simulate from the parametric model. We use these data sets as fixed background information to inform a non-mechanistic statistical model that approximates the correct parametric model and enables efficient simulation of a large number of data sets by Bayesian resampling methods. We show that under mild assumptions, the posterior distribution obtained by AABC converges to the posterior distribution obtained by ABC, as the number of data sets simulated from the parametric model and the sample size of the observed data set increase simultaneously. We illustrate the performance of AABC on a population-genetic model of natural selection, as well as on a model of the admixture history of hybrid populations.
    01/2013;
  • [show abstract] [hide abstract]
    ABSTRACT: Neighbor-joining is one of the most widely used methods for constructing evolutionary trees. This approach from phylogenetics is often employed in population genetics, where distance matrices obtained from allele frequencies are used to produce a representation of population relationships in the form of a tree. In phylogenetics, the utility of neighbor-joining derives partly from a result that for a class of distance matrices including those that are additive or tree-like-generated by summing weights over the edges connecting pairs of taxa in a tree to obtain pairwise distances-application of neighbor-joining recovers exactly the underlying tree. For populations within a species, however, migration and admixture can produce distance matrices that reflect more complex processes than those obtained from the bifurcating trees typical in the multispecies context. Admixed populations-populations descended from recent mixture of groups that have long been separated-have been observed to be located centrally in inferred neighbor-joining trees, with short external branches incident to the path connecting their source populations. Here, using a simple model, we explore mathematically the behavior of an admixed population under neighbor-joining. We show that with an additive distance matrix, a population admixed among two source populations necessarily lies on the path between the sources. Relaxing the additivity requirement, we examine the smallest nontrivial case-four populations, one of which is admixed between two of the other three-showing that the two source populations never merge with each other before one of them merges with the admixed population. Furthermore, the distance on the constructed tree between the admixed population and either source population is always smaller than the distance between the source populations, and the external branch for the admixed population is always incident to the path connecting the sources. We define three properties that hold for four taxa and that we hypothesize are satisfied under more general conditions: antecedence of clustering, intermediacy of distances, and intermediacy of path lengths. Our findings can inform interpretations of neighbor-joining trees with admixed groups, and they provide an explanation for patterns observed in trees of human populations.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 01/2013;
  • Michael D Edge, Prakash Gorroochurn, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Association mapping can be viewed as an application of population genetics and evolutionary biology to the problem of identifying genes causally connected to phenotypes. However, some population-genetic principles important to the design and analysis of association studies have not been widely understood or have even been generally misunderstood. Some of these principles underlie techniques that can aid in the discovery of genetic variants that influence phenotypes ('windfalls'), whereas others can interfere with study design or interpretation of results ('pitfalls'). Here, considering examples involving genetic variant discovery, linkage disequilibrium, power to detect associations, population stratification and genotype imputation, we address misunderstandings in the application of population genetics to association studies, and we illuminate how some surprising results in association contexts can be easily explained when considered from evolutionary and population-genetic perspectives. Through our examples, we argue that population-genetic thinking-which takes a theoretical view of the evolutionary forces that guide the emergence and propagation of genetic variants-substantially informs the design and interpretation of genetic association studies. In particular, population-genetic thinking sheds light on genetic confounding, on the relationships between association signals of typed markers and causal variants, and on the advantages and disadvantages of particular strategies for measuring genetic variation in association studies.
    Evolution, medicine, and public health. 01/2013; 2013(1):254-272.
  • Source
    Cuong V Than, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: In the minimizing-deep-coalescences (MDC) approach for species tree inference, a tree that has the minimal deep coalescence cost for reconciling a collection of gene trees is taken as an estimate of the species tree topology. The MDC method possesses the desirable Pareto property, and in practice it is quite accurate and computationally efficient. Here, in order to better understand the MDC method, we investigate some properties of the deep coalescence cost. We prove that the unit neighborhood of either a rooted species tree or a rooted gene tree under the deep coalescence cost is exactly the same as the tree's unit neighborhood under the rooted nearest-neighbor interchange (NNI) distance. Next, for a fixed species tree, we obtain the maximum deep coalescence cost across all gene trees as well as the number of gene trees that achieve the maximum cost. We also study corresponding problems for a fixed gene tree.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 10/2012; 10(1):61-72. · 2.25 Impact Factor
  • Lucy Huang, Erkan O Buzbas, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Empirical studies have identified population-genetic factors as important determinants of the properties of genotype-imputation accuracy in imputation-based disease association studies. Here, we develop a simple coalescent model of three sequences that we use to explore the theoretical basis for the influence of these factors on genotype-imputation accuracy, under the assumption of infinitely-many-sites mutation. Employing a demographic model in which two populations diverged at a given time in the past, we derive the approximate expectation and variance of imputation accuracy in a study sequence sampled from one of the two populations, choosing between two reference sequences, one sampled from the same population as the study sequence and the other sampled from the other population. We show that, under this model, imputation accuracy-as measured by the proportion of polymorphic sites that are imputed correctly in the study sequence-increases in expectation with the mutation rate, the proportion of the markers in a chromosomal region that are genotyped, and the time to divergence between the study and reference populations. Each of these effects derives largely from an increase in information available for determining the reference sequence that is genetically most similar to the sequence targeted for imputation. We analyze as a function of divergence time the expected gain in imputation accuracy in the target using a reference sequence from the same population as the target rather than from the other population. Together with a growing body of empirical investigations of genotype imputation in diverse human populations, our modeling framework lays a foundation for extending imputation techniques to novel populations that have not yet been extensively examined.
    Theoretical Population Biology 10/2012; · 1.24 Impact Factor
  • Source
    Michael Degiorgio, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Principal component (PC) maps, which plot the values of a given PC estimated on the basis of allele frequency variation at the geographic sampling locations of a set of populations, are often used to investigate the properties of past range expansions. Some studies have argued that in a range expansion, the axis of greatest variation (i.e., the first PC) is parallel to the axis of expansion. In contrast, others have identified a pattern in which the axis of greatest variation is perpendicular to the axis of expansion. Here, we seek to understand this difference in outcomes by investigating the effect of the geographic sampling scheme on the direction of the axis of greatest variation under a two-dimensional range expansion model. From datasets simulated using each of two different schemes for the geographic sampling of populations under the model, we create PC maps for the first PC. We find that depending on the geographic sampling scheme, the axis of greatest variation can be either parallel or perpendicular to the axis of expansion. We provide an explanation for this result in terms of intra- and inter-population coalescence times.
    Molecular Biology and Evolution 10/2012; · 10.35 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Genome-wide patterns of homozygosity runs and their variation across individuals provide a valuable and often untapped resource for studying human genetic diversity and evolutionary history. Using genotype data at 577,489 autosomal SNPs, we employed a likelihood-based approach to identify runs of homozygosity (ROH) in 1,839 individuals representing 64 worldwide populations, classifying them by length into three classes-short, intermediate, and long-with a model-based clustering algorithm. For each class, the number and total length of ROH per individual show considerable variation across individuals and populations. The total lengths of short and intermediate ROH per individual increase with the distance of a population from East Africa, in agreement with similar patterns previously observed for locus-wise homozygosity and linkage disequilibrium. By contrast, total lengths of long ROH show large interindividual variations that probably reflect recent inbreeding patterns, with higher values occurring more often in populations with known high frequencies of consanguineous unions. Across the genome, distributions of ROH are not uniform, and they have distinctive continental patterns. ROH frequencies across the genome are correlated with local genomic variables such as recombination rate, as well as with signals of recent positive selection. In addition, long ROH are more frequent in genomic regions harboring genes associated with autosomal-dominant diseases than in regions not implicated in Mendelian diseases. These results provide insight into the way in which homozygosity patterns are produced, and they generate baseline homozygosity patterns that can be used to aid homozygosity mapping of genes associated with recessive diseases.
    The American Journal of Human Genetics 08/2012; 91(2):275-92. · 11.20 Impact Factor
  • James H Degnan, Noah A Rosenberg, Tanja Stadler
    [show abstract] [hide abstract]
    ABSTRACT: Ranked gene trees, which consider both the gene tree topology and the sequence in which gene lineages separate, can potentially provide a new source of information for use in modeling genealogies and performing inference of species trees. Recently, we have calculated the probability distribution of ranked gene trees under the standard multispecies coalescent model for the evolution of gene lineages along the branches of a fixed species tree, demonstrating the existence of anomalous ranked gene trees (ARGTs), in which a ranked gene tree that does not match the ranked species tree can have greater probability under the model than the matching ranked gene tree. Here, we fully characterize the set of unranked species tree topologies that give rise to ARGTs, showing that this set contains all species tree topologies with five or more taxa, with the exceptions of caterpillars and pseudocaterpillars. The results have implications for the use of ranked gene trees in phylogenetic inference.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 08/2012; · 2.25 Impact Factor
  • Source
    Chaolong Wang, Sebastian Zöllner, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.
    PLoS Genetics 08/2012; 8(8):e1002886. · 8.52 Impact Factor
  • Source
    Chaolong Wang, Kari B Schroeder, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, in which one or both allelic copies at a locus fail to be amplified by the polymerase chain reaction. Especially for samples with poor DNA quality, this problem causes a downward bias in estimates of observed heterozygosity and an upward bias in estimates of inbreeding, owing to mistaken classifications of heterozygotes as homozygotes when one of the two copies drops out. One general approach for avoiding allelic dropout involves repeated genotyping of homozygous loci to minimize the effects of experimental error. Existing computational alternatives often require replicate genotyping as well. These approaches, however, are costly and are suitable only when enough DNA is available for repeated genotyping. In this study, we propose a maximum-likelihood approach together with an expectation-maximization algorithm to jointly estimate allelic dropout rates and allele frequencies when only one set of nonreplicated genotypes is available. Our method considers estimates of allelic dropout caused by both sample-specific factors and locus-specific factors, and it allows for deviation from Hardy-Weinberg equilibrium owing to inbreeding. Using the estimated parameters, we correct the bias in the estimation of observed heterozygosity through the use of multiple imputations of alleles in cases where dropout might have occurred. With simulated data, we show that our method can (1) effectively reproduce patterns of missing data and heterozygosity observed in real data; (2) correctly estimate model parameters, including sample-specific dropout rates, locus-specific dropout rates, and the inbreeding coefficient; and (3) successfully correct the downward bias in estimating the observed heterozygosity. We find that our method is fairly robust to violations of model assumptions caused by population structure and by genotyping errors from sources other than allelic dropout. Because the data sets imputed under our model can be investigated in additional subsequent analyses, our method will be useful for preparing data for applications in diverse contexts in population genetics and molecular ecology.
    Genetics 07/2012; 192(2):651-69. · 4.39 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Recent studies have examined the influence on patterns of human genetic variation of a variety of cultural practices. In India, centuries-old marriage customs have introduced extensive social structuring into the contemporary population, potentially with significant consequences for genetic variation. Social stratification in India is evident as social classes that are defined by endogamous groups known as castes. Within a caste, there exist endogamous groups known as gols (marriage circles), each of which comprises a small number of exogamous gotra (lineages). Thus, while consanguinity is strictly avoided and some randomness in mate selection occurs within the gol, gene flow is limited with groups outside the gol. Gujarati Patels practice this form of "exogamic endogamy." We have analyzed genetic variation in one such group of Gujarati Patels, the Chha Gaam Patels (CGP), who comprise individuals from six villages. Population structure analysis of 1,200 autosomal loci offers support for the existence of distinctive multilocus genotypes in the CGP with respect to both non-Gujaratis and other Gujaratis, and indicates that CGP individuals are genetically very similar. Analysis of Y-chromosomal and mitochondrial haplotypes provides support for both patrilocal and patrilineal practices within the gol, and a low-level of female gene flow into the gol. Our study illustrates how the practice of gol endogamy has introduced fine-scale genetic structure into the population of India, and contributes more generally to an understanding of the way in which marriage practices affect patterns of genetic variation.
    American Journal of Physical Anthropology 06/2012; 149(1):92-103. · 2.48 Impact Factor
  • Source
    Laura J Helmkamp, Ethan M Jewett, Noah A Rosenberg
    [show abstract] [hide abstract]
    ABSTRACT: Among the methods currently available for inferring species trees from gene trees, the GLASS method of Mossel and Roch (2010), the Shallowest Divergence (SD) method of Maddison and Knowles (2006), the STEAC method of Liu et al. (2009), and a related method that we call Minimum Average Coalescence (MAC) are computationally efficient and provide branch length estimates. Further, GLASS and STEAC have been shown to be consistent estimators of tree topology under a multispecies coalescent model. However, divergence time estimates obtained with these methods are all systematically biased under the model because the pairwise interspecific gene divergence times on which they rely must be more ancient than the species divergence time. Jewett and Rosenberg (2012) derived an expression for the bias of GLASS and used it to propose an improved method that they termed iGLASS. Here, we derive the biases of SD, STEAC, and MAC, and we propose improved analogues of these methods that we call iSD, iSTEAC, and iMAC. We conduct simulations to compare the performance of these methods with their original counterparts and with GLASS and iGLASS, finding that each of them decreases the bias and mean squared error of pairwise divergence time estimates. The new methods can therefore contribute to improvements in the estimation of species trees from information on gene trees.
    Journal of computational biology: a journal of computational molecular cell biology 06/2012; 19(6):632-49. · 1.69 Impact Factor

Publication Stats

8k Citations
713 Downloads
806.86 Total Impact Points

Institutions

  • 1999–2014
    • Stanford University
      • Department of Biology
      Palo Alto, California, United States
    • University of Oxford
      • Department of Statistics
      Oxford, ENG, United Kingdom
  • 2013
    • University of Manitoba
      Winnipeg, Manitoba, Canada
  • 2009–2013
    • Tel Aviv University
      • Department of Zoology
      Tel Aviv, Tel Aviv, Israel
    • Masaryk University
      Brünn, South Moravian, Czech Republic
    • University of California, Davis
      • Department of Anthropology
      Davis, CA, United States
  • 2012
    • University of California, Berkeley
      • Department of Integrative Biology
      Berkeley, CA, United States
  • 2005–2012
    • University of Michigan
      • • Department of Biostatistics
      • • Department of Computational Medicine and Bioinformatics
      • • Department of Human Genetics
      Ann Arbor, MI, United States
    • University of Texas Health Science Center at Houston
      • Human Genetics Center
      Houston, TX, United States
  • 2011
    • Statens Serum Institut
      København, Capital Region, Denmark
    • Brown University
      • Department of Ecology and Evolutionary Biology
      Providence, RI, United States
    • Johns Hopkins University
      • Department of Biostatistics
      Baltimore, MD, United States
  • 2008
    • Grenoble Institute of Technology
      Grenoble, Rhône-Alpes, France
  • 2002–2008
    • University of Southern California
      • • Institute for Genetic Medicine
      • • Department of Biological Sciences
      • • Division of Molecular and Computational Biology
      Los Angeles, CA, United States
  • 2007
    • Concordia University–Ann Arbor
      Ann Arbor, Michigan, United States
  • 2006
    • Harvard University
      • Department of Biostatistics
      Boston, MA, United States
  • 2003
    • Vavilov Institute of General Genetics
      Moskva, Moscow, Russia