S Tavaré

University of Cambridge, Cambridge, England, United Kingdom

Are you S Tavaré?

Claim your profile

Publications (79)354.38 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Lineage-tracing approaches, widely used to characterize stem cell populations, rely on the specificity and stability of individual markers for accurate results. We present a method in which genetic labeling in the intestinal epithelium is acquired as a mutation-induced clonal mark during DNA replication. By determining the rate of mutation in vivo and combining this data with the known neutral-drift dynamics that describe intestinal stem cell replacement, we quantify the number of functional stem cells in crypts and adenomas. Contrary to previous reports, we find that significantly lower numbers of "working" stem cells are present in the intestinal epithelium (five to seven per crypt) and in adenomas (nine per gland), and that those stem cells are also replaced at a significantly lower rate. These findings suggest that the bulk of tumor stem cell divisions serve only to replace stem cell loss, with rare clonal victors driving gland repopulation and tumor growth.
    Cell stem cell 09/2013; · 23.56 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic activity of signaling pathways, such as Notch, is vital to achieve correct development and homeostasis. However, most studies assess output many hours or days after initiation of signaling, once the outcome has been consolidated. Here we analyze genome-wide changes in transcript levels, binding of the Notch pathway transcription factor, CSL [Suppressor of Hairless, Su(H), in Drosophila], and RNA Polymerase II (Pol II) immediately following a short pulse of Notch stimulation. A total of 154 genes showed significant differential expression (DE) over time, and their expression profiles stratified into 14 clusters based on the timing, magnitude, and direction of DE. E(spl) genes were the most rapidly upregulated, with Su(H), Pol II, and transcript levels increasing within 5-10 minutes. Other genes had a more delayed response, the timing of which was largely unaffected by more prolonged Notch activation. Neither Su(H) binding nor poised Pol II could fully explain the differences between profiles. Instead, our data indicate that regulatory interactions, driven by the early-responding E(spl)bHLH genes, are required. Proposed cross-regulatory relationships were validated in vivo and in cell culture, supporting the view that feed-forward repression by E(spl)bHLH/Hes shapes the response of late-responding genes. Based on these data, we propose a model in which Hes genes are responsible for co-ordinating the Notch response of a wide spectrum of other targets, explaining the critical functions these key regulators play in many developmental and disease contexts.
    PLoS Genetics 01/2013; 9(1):e1003162. · 8.52 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To identify novel dynamic patterns of gene expression, we develop a statistical method to cluster noisy measurements of gene expression collected from multiple replicates at multiple time points, with an unknown number of clusters. We propose a random-effects mixture model coupled with a Dirichlet-process prior for clustering. The mixture model formulation allows for probabilistic cluster assignments. The random-effects formulation allows for attributing the total variability in the data to the sources that are consistent with the experimental design, particularly when the noise level is high and the temporal dependence is not strong. The Dirichlet-process prior induces a prior distribution on partitions and helps to estimate the number of clusters (or mixture components) from the data. We further tackle two challenges associated with Dirichlet-process prior-based methods. One is efficient sampling. We develop a novel Metropolis-Hastings Markov Chain Monte Carlo (MCMC) procedure to sample the partitions. The other is efficient use of the MCMC samples in forming clusters. We propose a two-step procedure for posterior inference, which involves resampling and relabeling, to estimate the posterior allocation probability matrix. This matrix can be directly used in cluster assignments, while describing the uncertainty in clustering. We demonstrate the effectiveness of our model and sampling procedure through simulated data. Applying our method to a real data set collected from Drosophila adult muscle cells after five-minute Notch activation, we identify 14 clusters of different transcriptional responses among 163 differentially expressed genes, which provides several novel insights into underlying transcriptional mechanisms in the Notch signaling pathway. The algorithm developed here is implemented in the R package DIRECT.
    The Annals of Applied Statistics 10/2012; 7(3). · 2.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
    Nature 04/2012; 486(7403):346-52. · 38.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The cancer stem cell (CSC) concept is a highly debated topic in cancer research. While experimental evidence in favor of the cancer stem cell theory is apparently abundant, the results are often criticized as being difficult to interpret. An important reason for this is that most experimental data that support this model rely on transplantation studies. In this study we use a novel cellular Potts model to elucidate the dynamics of established malignancies that are driven by a small subset of CSCs. Our results demonstrate that epigenetic mutations that occur during mitosis display highly altered dynamics in CSC-driven malignancies compared to a classical, non-hierarchical model of growth. In particular, the heterogeneity observed in CSC-driven tumors is considerably higher. We speculate that this feature could be used in combination with epigenetic (methylation) sequencing studies of human malignancies to prove or refute the CSC hypothesis in established tumors without the need for transplantation. Moreover our tumor growth simulations indicate that CSC-driven tumors display evolutionary features that can be considered beneficial during tumor progression. Besides an increased heterogeneity they also exhibit properties that allow the escape of clones from local fitness peaks. This leads to more aggressive phenotypes in the long run and makes the neoplasm more adaptable to stringent selective forces such as cancer treatment. Indeed when therapy is applied the clone landscape of the regrown tumor is more aggressive with respect to the primary tumor, whereas the classical model demonstrated similar patterns before and after therapy. Understanding these often counter-intuitive fundamental properties of (non-)hierarchically organized malignancies is a crucial step in validating the CSC concept as well as providing insight into the therapeutical consequences of this model.
    PLoS Computational Biology 05/2011; 7(5):e1001132. · 4.87 Impact Factor
  • Source
    Doug Speed, Simon Tavaré
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents Sparse Partitioning, a Bayesian method for identifying predictors that either individually or in combination with others affect a response variable. The method is designed for regression problems involving binary or tertiary predictors and allows the number of predictors to exceed the size of the sample, two properties which make it well suited for association studies. Sparse Partitioning differs from other regression methods by placing no restrictions on how the predictors may influence the response. To compensate for this generality, Sparse Partitioning implements a novel way of exploring the model space. It searches for high posterior probability partitions of the predictor set, where each partition defines groups of predictors that jointly influence the response. The result is a robust method that requires no prior knowledge of the true predictor--response relationship. Testing on simulated data suggests Sparse Partitioning will typically match the performance of an existing method on a data set which obeys the existing method's model assumptions. When these assumptions are violated, Sparse Partitioning will generally offer superior performance.
    The Annals of Applied Statistics 01/2011; 5(2011). · 2.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates. The CNAseg package and test data are available at http://www.compbio.group.cam.ac.uk/software.html.
    Bioinformatics 10/2010; 26(24):3051-8. · 5.47 Impact Factor
  • Sergii Ivakhno, Simon Tavaré
    [Show abstract] [Hide abstract]
    ABSTRACT: The current generation of single nucleotide polymorphism (SNP) arrays allows measurement of copy number aberrations (CNAs) in cancer at more than one million locations in the genome in hundreds of tumour samples. Most research has focused on single-sample CNA discovery, the so-called segmentation problem. The availability of high-density, large sample-size SNP array datasets makes the identification of recurrent copy number changes in cancer, an important issue that can be addressed using the cross-sample information. We present a novel approach for finding regions of recurrent copy number aberrations, called CNAnova, from Affymetrix SNP 6.0 array data. The method derives its statistical properties from a control dataset composed of normal samples and, in contrast to previous methods, does not require segmentation and permutation steps. For rigorous testing of the algorithm and comparison to existing methods, we developed a simulation scheme that uses the noise distribution present in Affymetrix arrays. Application of the method to 128 acute lymphoblastic leukaemia samples shows that CNAnova achieves lower error rate than a popular alternative approach. We also describe an extension of the CNAnova framework to identify recurrent CNA regions with intra-tumour heterogeneity, present in either primary or relapsed samples from the same patients. The CNAnova package and synthetic datasets are available at http://www.compbio.group.cam.ac.uk/software.html.
    Bioinformatics 06/2010; 26(11):1395-402. · 5.47 Impact Factor
  • Source
    A. D. Barbour, Simon Tavaré
    [Show abstract] [Hide abstract]
    ABSTRACT: The dynamics of tumour evolution are not well understood. In this paper we provide a statistical framework for evaluating the molecular variation observed in different parts of a colorectal tumour. A multi-sample version of the Ewens Sampling Formula forms the basis for our modelling of the data, and we provide a simulation procedure for use in obtaining reference distributions for the statistics of interest. We also describe the large-sample asymptotics of the joint distributions of the variation observed in different parts of the tumour. While actual data should be evaluated with reference to the simulation procedure, the asymptotics serve to provide theoretical guidelines, for instance with reference to the choice of possible statistics. Comment: 22 pages, 1 figure. Chapter 4 of "Probability and Mathematical Genetics: Papers in Honour of Sir John Kingman" (Editors N.H. Bingham and C.M. Goldie), Cambridge University Press, 2010
  • Ejc Supplements - EJC SUPPL. 01/2010; 8(5):208-208.
  • Source
    Sottoriva A, Tavaré S
    COMPSTAT 2010 -- Proceedings in Computational Statistics; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The amplification of millions of single molecules in parallel can be performed on microscopic magnetic beads that are contained in aqueous compartments of an oil-buffer emulsion. These bead-emulsion amplification (BEA) reactions result in beads that are covered by almost-identical copies derived from a single template. The post-amplification analysis is performed using different fluorophore-labeled probes. We have identified BEA reaction conditions that efficiently produce longer amplicons of up to 450 base pairs. These conditions include the use of a Titanium Taq amplification system. Second, we explored alternate fluorophores coupled to probes for post-PCR DNA analysis. We demonstrate that four different Alexa fluorophores can be used simultaneously with extremely low crosstalk. Finally, we developed an allele-specific extension chemistry that is based on Alexa dyes to query individual nucleotides of the amplified material that is both highly efficient and specific.
    Analytical Chemistry 08/2009; 81(14):5770-6. · 5.70 Impact Factor
  • Source
    01/2009: pages 33-37;
  • Ejc Supplements - EJC SUPPL. 01/2008; 6(9):116-116.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We analysed 148 primary breast cancers using BAC-arrays containing 287 clones representing cancer-related gene/loci to obtain genomic molecular portraits. Gains were detected in 136 tumors (91.9%) and losses in 123 tumors (83.1%). Eight tumors (5.4%) did not have any genomic aberrations in the 281 clones analysed. Common (more than 15% of the samples) gains were observed at 8q11-qtel, 1q21-qtel, 17q11-q12 and 11q13, whereas common losses were observed at 16q12-qtel, 11ptel-p15.5, 1p36-ptel, 17p11.2-p12 and 8ptel-p22. Patients with tumors registering either less than 5% (median value) or less than 11% (third quartile) total copy number changes had a better overall survival (log-rank test: P=0.0417 and P=0.0375, respectively). Unsupervised hierarchical clustering based on copy number changes identified four clusters. Women with tumors from the cluster with amplification of three regions containing known breast oncogenes (11q13, 17q12 and 20q13) had a worse prognosis. The good prognosis group (Nottingham Prognostic Index (NPI) <or=3.4) tumors had frequent loss of 16q24-qtel. Genes significantly associated with estrogen receptor (ER), Grade and NPI were used to build k-nearest neighbor (KNN) classifiers that predicted ER, Grade and NPI status in the test set with an average misclassification rate of 24.7, 25.7 and 35.7%, respectively. These data raise the prospect of generating a molecular taxonomy of breast cancer based on copy number profiling using tumor DNA, which may be more generally applicable than expression microarray analysis.
    Oncogene 04/2007; 26(13):1959-70. · 8.56 Impact Factor
  • Source
    Simon Tavaré
    07/2006; , ISBN: 9780470015902
  • Source
    Richard M Clark, Simon Tavaré, John Doebley
    [Show abstract] [Hide abstract]
    ABSTRACT: To estimate a rate for single nucleotide substitutions for maize (Zea mays ssp. mays), we have taken advantage of data from genetic and archaeological studies of the domestication of maize from its wild ancestor, teosinte (Z. mays ssp. parviglumis). Genetic studies have shown that the teosinte branched1 (tb1) gene was a major target of human selection during maize domestication, and sequence diversity in the intergenic region 5' to the tb1-coding sequence is extraordinarily low. We show that polymorphism in this region is consistent with new mutation following fixation for a small number of tb1 haplotypes during domestication. Archeological studies suggest that maize was domesticated approximately 6,250-10,000 years ago and subsequently the size of the maize population is thought to have expanded rapidly. Using the observed number of mutations within the region of selection at tb1, the approximate age of maize domestication, and approximations for the maize genealogy, we have derived estimates for the nucleotide substitution rate for the tb1 intergenic region. Using two approaches, one of which is a coalescent approach, we obtain rate estimates of approximately 2.9 x 10(-8) and 3.3 x 10(-8) substitutions per site per year. We also show that the pattern of polymorphism in the tb1 intergenic region appears to have been strongly affected by the mutagenic effect of DNA methylation. Excluding target sites of symmetric DNA methylation (CG and CNG sites) from analysis, the mutation rate estimates are reduced by approximately 50%-60%, while the rates for CG and CNG sites are nearly an order of magnitude higher. We use rate estimates from the tb1 region to estimate the timing of expansion of transposable elements in the maize genome and suggest that this expansion occurred primarily within the last million years.
    Molecular Biology and Evolution 12/2005; 22(11):2304-12. · 10.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Several tests of neutral evolution employ the observed number of segregating sites and properties of the haplotype frequency distribution as summary statistics and use simulations to obtain rejection probabilities. Here we develop a "haplotype configuration test" of neutrality (HCT) based on the full haplotype frequency distribution. To enable exact computation of rejection probabilities for small samples, we derive a recursion under the standard coalescent model for the joint distribution of the haplotype frequencies and the number of segregating sites. For larger samples, we consider simulation-based approaches. The utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.
    Genetics 04/2005; 169(3):1763-77. · 4.39 Impact Factor
  • Source
    Ofer Zeitouni, Simon Tavaré
    [Show abstract] [Hide abstract]
    ABSTRACT: 1 Introduction 1.1 Model 1.2 Examples 2 RWRE - d = 1 2.1 Ergodic theorems 2.2 CLT for ergodic environments 2.3 Large deviations 2.4 The subexponential regime 2.5 Sinai$d > 1$d > 1 3.1 Ergodic Theorems 3.2 A Law of Large Numbers in \mathbbZd\mathbb{Z}^d 3.3 CLT for walks in balanced environments 3.4 Large deviations for nestling walks 3.5 Kalikows condition References
    01/2004: pages 1930-1930;

Publication Stats

3k Citations
354.38 Total Impact Points


  • 2005–2011
    • University of Cambridge
      • • Department of Oncology
      • • School of Biological Sciences
      Cambridge, England, United Kingdom
  • 2010
    • Cancer Research UK Cambridge Institute
      Cambridge, England, United Kingdom
  • 2009
    • Leeds Trinity
      Leeds, England, United Kingdom
  • 1990–2006
    • University of Southern California
      • • Department of Biological Sciences
      • • Division of Molecular and Computational Biology
      • • Department of Mathematics
      Los Angeles, California, United States
  • 1996
    • University of Colorado
      • Division of Pulmonary Sciences and Critical Care Medicine
      Denver, CO, United States
    • University of California, Los Angeles
      Los Angeles, California, United States
  • 1995
    • University of Illinois at Chicago
      Chicago, Illinois, United States
    • University of Idaho
      • Department of Mathematics
      Moscow, ID, United States
  • 1994–1995
    • Monash University (Australia)
      Melbourne, Victoria, Australia
  • 1987–1989
    • University of Utah
      • Department of Mathematics
      Salt Lake City, UT, United States
    • University College London
      • Department of Statistical Science
      London, ENG, United Kingdom
  • 1982–1984
    • Colorado State University
      • Department of Statistics
      Fort Collins, CO, United States
  • 1981–1982
    • Stanford University
      • Department of Mathematics
      Palo Alto, CA, United States