Lincoln Stein

University of Toronto, Toronto, Ontario, Canada

Are you Lincoln Stein?

Claim your profile

Publications (196)2062.96 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Noonan syndrome (NS) is a relatively common genetic disorder, characterized by typical facies, short stature, developmental delay, and cardiac abnormalities. Known causative genes account for 70-80% of clinically diagnosed NS patients, but the genetic basis for the remaining 20-30% of cases is unknown. We performed next-generation sequencing on germ-line DNA from 27 NS patients lacking a mutation in the known NS genes. We identified gain-of-function alleles in Ras-like without CAAX 1 (RIT1) and mitogen-activated protein kinase kinase 1 (MAP2K1) and previously unseen loss-of-function variants in RAS p21 protein activator 2 (RASA2) that are likely to cause NS in these patients. Expression of the mutant RASA2, MAP2K1, or RIT1 alleles in heterologous cells increased RAS-ERK pathway activation, supporting a causative role in NS pathogenesis. Two patients had more than one disease-associated variant. Moreover, the diagnosis of an individual initially thought to have NS was revised to neurofibromatosis type 1 based on an NF1 nonsense mutation detected in this patient. Another patient harbored a missense mutation in NF1 that resulted in decreased protein stability and impaired ability to suppress RAS-ERK activation; however, this patient continues to exhibit a NS-like phenotype. In addition, a nonsense mutation in RPS6KA3 was found in one patient initially diagnosed with NS whose diagnosis was later revised to Coffin-Lowry syndrome. Finally, we identified other potential candidates for new NS genes, as well as potential carrier alleles for unrelated syndromes. Taken together, our data suggest that next-generation sequencing can provide a useful adjunct to RASopathy diagnosis and emphasize that the standard clinical categories for RASopathies might not be adequate to describe all patients.
    Proceedings of the National Academy of Sciences of the United States of America. 07/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tumors often contain multiple, genetically distinct subpopulations of cancerous cells. These so-called subclonal populations are defined by distinct somatic mutations that include point mutations such as single nucleotide variants and small indels -- collectively called simple somatic mutations (SSMs) -- as well as larger structural changes that result in copy number variations (CNVs). In some cases, the genotype and prevalence of these subpopulations can be reconstructed based on high-throughput, short-read sequencing of DNA in one or more tumor samples. To date, no automated SSM-based subclonal reconstructions have been attempted on WGS data; and CNV-based reconstructions are limited to tumors with two or fewer cancerous subclonal populations and with a small number of CNVs. We describe a new automated method, PhyloWGS, that can be applied to WGS data from one or more tumor samples to perform subclonal reconstruction based on both CNVs and SSMs. PhyloWGS successfully recovers the composition of mixtures of a highly rearranged TGCA cell line when a CNV-based method fails. On WGS data with average read depth of 40 from five time-series chronic lymphocytic leukemia samples, PhyloWGS recovers the same tumor phylogeny previously reconstructed using deep targeted resequencing. To further explore the limits of WGS-based subclonal reconstruction, we ran PhyloWGS on simulated data: PhyloWGS can reliably reconstruct as many as three cancerous subpopulations based on 30-50x coverage WGS data from a single tumor sample with 10's to 1000's of SSMs per subpopulation. At least five cancerous subpopulations can be reconstructed if provided with read depths of 200 or more.
  • Source
    Liya Wang, Lincoln Stein, Doreen Ware
    [Show abstract] [Hide abstract]
    ABSTRACT: The average size of internal translated exons, ranging from 120 to 165 nt across metazoans, is approximately the size of the typical mononucleosome (147 nt). Genome-wide study has also shown that nucleosome occupancy is significantly higher in exons than in introns, which might indicate that the evolution of exon size is related to its nucleosome occupancy. By grouping exons by the GC contents of their flanking introns, we show that the average exon size is positively correlated with its GC content. Using the sequencing data from direct mapping of Homo sapiens nucleosomes with limited nuclease digestion, we show that the level of nucleosome occupancy is also positively correlated with the exon GC content in a similar fashion. We then demonstrated that exon size is positively correlated with their nucleosome occupancy. The strong correlation between exon size and the nucleosome occupancy suggests that chromatin organization may be related to the evolution of exon sizes.
  • Source
    Abigail Cabunoc, Todd W. Harris, Lincoln D. Stein
    [Show abstract] [Hide abstract]
    ABSTRACT: Background / Purpose: The WormBase website, a highly curated central data repository for Caenorhabditis, has added several features to aid in the biocuration process. Main conclusion: WormBase now supports real-time updates for specific data and custom views and tools available only to WormBase curators while also engaging the community in the curation process.
    Biocuration 2014; 04/2014
  • Nature Genetics 03/2014; 46(4):318-9. · 35.21 Impact Factor
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: We demonstrate a flexible genome-wide association study platform built upon the iPlant Collaborative Cyber-infrastructure. The platform supports big data management, sharing, and large-scale study of both genotype and phenotype data on clusters. End users can add their own analysis tools and create customized analysis workflows through the graphical user interfaces in both iPlant Discovery Environment and BioExtract server. Copyright © 2014 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 03/2014; · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In acute myeloid leukaemia (AML), the cell of origin, nature and biological consequences of initiating lesions, and order of subsequent mutations remain poorly understood, as AML is typically diagnosed without observation of a pre-leukaemic phase. Here, highly purified haematopoietic stem cells (HSCs), progenitor and mature cell fractions from the blood of AML patients were found to contain recurrent DNMT3A mutations (DNMT3A(mut)) at high allele frequency, but without coincident NPM1 mutations (NPM1c) present in AML blasts. DNMT3A(mut)-bearing HSCs showed a multilineage repopulation advantage over non-mutated HSCs in xenografts, establishing their identity as pre-leukaemic HSCs. Pre-leukaemic HSCs were found in remission samples, indicating that they survive chemotherapy. Therefore DNMT3A(mut) arises early in AML evolution, probably in HSCs, leading to a clonally expanded pool of pre-leukaemic HSCs from which AML evolves. Our findings provide a paradigm for the detection and treatment of pre-leukaemic clones before the acquisition of additional genetic lesions engenders greater therapeutic resistance.
    Nature 02/2014; · 38.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput sequencing allows the detection and quantification of frequencies of somatic singlenucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionaryhistory and population frequency of the subclonal lineages of tumor cells present in the sample canbe reconstructed from these SNV frequency measurements. But automated methods to do thisreconstruction are not available and the conditions under which reconstruction is possible have notbeen described. We describe the conditions under which the evolutionary history can be uniquely reconstructed fromSNV frequencies from single or multiple samples from the tumor population and we introduce a newstatistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineagesrepresented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees thatgroups SNVs into major subclonal lineages and automatically estimates the number of lineages andtheir ancestry. We sample from the joint posterior distribution over trees to identify evolutionaryhistories and cell population frequencies that have the highest probability of generating the observedSNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies,PhyloSub represents the uncertainty in the tumor phylogeny using a "partial order plot." Experimentson a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemiaand chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (orchain) and branching lineages and its inferences are in good agreement with ground truth, where it isavailable. PhyloSub can be applied to frequencies of any "binary" somatic mutation, including SNVs as well assmall insertions and deletions. The PhyloSub and partial order plot software is available from
    BMC Bioinformatics 02/2014; 15(1):35. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies indicate that high-grade serous ovarian carcinoma (HGSOC), the most common epithelial ovarian carcinoma histotype, originates from the fallopian tube epithelium (FTE). Risk factors for this cancer include reproductive parameters associated with lifetime ovulatory events. Ovulation is an acute inflammatory process during which the FTE is exposed to follicular fluid containing both pro- and anti-inflammatory molecules, such as interleukin-1 (IL1), tumor necrosis factor (TNF), and cortisol. Repeated exposure to inflammatory cytokines may contribute to transforming events in the FTE, with glucocorticoids exerting a protective effect. The global response of FTE cells to inflammatory cytokines or glucocorticoids has not been investigated. To examine the response of FTE cells and the ability of glucocorticoids to oppose this response, an immortalized human FTE cell line, OE-E6/E7, was treated with IL1β, dexamethasone (DEX), IL1β and DEX, or vehicle and genome-wide gene expression profiling was performed. IL1β altered the expression of 47 genes of which 17 were reversed by DEX. DEX treatment alone altered the expression of 590 genes, whereas combined DEX and IL1β treatment altered the expression of 784 genes. Network and pathway enrichment analysis indicated that many genes altered by DEX are involved in cytokine, chemokine, and cell cycle signaling, including NFκΒ target genes and interacting proteins. Quantitative real time RT-PCR studies validated the gene array data for IL8, IL23A, PI3 and TACC2 in OE-E6/E7 cells. Consistent with the array data, Western blot analysis showed increased levels of PTGS2 protein induced by IL1β that was blocked by DEX. A parallel experiment using primary cultured human FTE cells indicated similar effects on PTGS2, IL8, IL23A, PI3 and TACC2 transcripts. These findings support the hypothesis that pro-inflammatory signaling is induced in FTE cells by inflammatory mediators and raises the possibility that dysregulation of glucocorticoid signaling could contribute to increased risk for HGSOC.
    PLoS ONE 01/2014; 9(5):e97997. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called "ReactomeFIViz", which utilizes a highly reliable gene functional interaction network combined with human curated pathways derived from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.
    F1000Research. 01/2014; 3:146.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reactome ( is a manually curated open-source open-data resource of human pathways and reactions. The current version 46 describes 7088 human proteins (34% of the predicted human proteome), participating in 6744 reactions based on data extracted from 15 107 research publications with PubMed links. The Reactome Web site and analysis tool set have been completely redesigned to increase speed, flexibility and user friendliness. The data model has been extended to support annotation of disease processes due to infectious agents and to mutation.
    Nucleic Acids Research 11/2013; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gramene ( is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
    Nucleic Acids Research 11/2013; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: WormBase ( is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.
    Nucleic Acids Research 11/2013; · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A locus on human chromosome 11q23 tagged by marker rs3802842 was associated with colorectal cancer in a genome-wide association study; this finding has been replicated in case-control studies worldwide. In order to identify biologic factors at this locus that are related to the etiopathology of colorectal cancer, we used microarray-based target selection methods, coupled to next-generation sequencing, to study 103 kb at the 11q23 locus. We genotyped 369 putative variants from 1030 patients with colorectal cancer (cases) and 1061 individuals without colorectal cancer (controls) from the Ontario Familial Colorectal Cancer Registry. Two previously uncharacterized genes, COLCA1 and COLCA2, were found to be co-regulated genes that are transcribed from opposite strands. Expression levels of COLCA1 and COLCA2 transcripts correlate with rs3802842 genotypes. In colon tissues, COLCA1 co-localizes with crystalloid granules of eosinophils and granular organelles of mast cells, neutrophils, macrophages, dendritic cells, and differentiated myeloid-derived cell lines. COLCA2 is present in the cytoplasm of normal epithelial, immune, and other cell lineages, as well as tumor cells. Tissue microarray analysis demonstrates the association of rs3802842 with lymphocyte density in the lamina propria (P=.014) and levels of COLCA1 in the lamina propria (P=.00016) and COLCA2 (tumor cells, P=.0041 and lamina propria, P = 6x10(-5) ). In conclusion, genetic, expression, and immunohistochemical data implicate COLCA1 and COLCA2 in the pathogenesis of colon cancer. Histologic analyses indicate the involvement of immune pathways. © 2013 Wiley Periodicals, Inc.
    International Journal of Cancer 10/2013; · 6.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.
    Genome biology 08/2013; 14(8):R93. · 10.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: New strategies to combat complex human disease require systems approaches to biology that integrate experiments from cell lines, primary tissues and model organisms. We have developed Pathprint, a functional approach that compares gene expression profiles in a set of pathways, networks and transcriptionally-regulated targets. It can be applied universally to gene expression profiles across species. Integration of large-scale profiling methods and curation of the public repository overcomes platform, species and batch effects to yield a standard measure of functional distance between experiments. We show that Pathprints combine mouse and human blood developmental lineage, and develop new prognostic indicators in Acute Myeloid Leukemia. The code and resources are available at
    Genome Medicine 07/2013; 5(7):68. · 4.94 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition. In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (, on the public Amazon Cloud (, and on the private Bionimbus Cloud for genomic research ( In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies. Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.
    BMC Genomics 07/2013; 14(1):494. · 4.40 Impact Factor
  • Nature Methods 07/2013; 10(8):723-729. · 23.57 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
    Scientific Reports 05/2013; 3:1802. · 5.08 Impact Factor

Publication Stats

20k Citations
2,062.96 Total Impact Points


  • 2014
    • University of Toronto
      Toronto, Ontario, Canada
  • 2008–2014
    • Ontario Institute for Cancer Research
      Toronto, Ontario, Canada
    • University of California, Los Angeles
      • Department of Human Genetics
      Los Angeles, CA, United States
    • University of Delaware
      • Department of Animal and Food Sciences
      Newark, DE, United States
  • 1970–2013
    • Cold Spring Harbor Laboratory
      Cold Spring Harbor, New York, United States
  • 2011
    • Stony Brook University
      • Department of Biomedical Engineering
      Stony Brook, NY, United States
  • 2006–2011
    • California Institute of Technology
      • Division of Biology
      Pasadena, CA, United States
    • University College Dublin
      Dublin, Leinster, Ireland
  • 2010
    • Yale University
      • Department of Computational Biology and Bioinformatics
      New Haven, CT, United States
    • The University of Arizona
      • Department of Ecology and Evolutionary Biology
      Tucson, AZ, United States
  • 2007–2010
    • Broad Institute of MIT and Harvard
      • Program in Medical and Population Genetics
      Cambridge, Massachusetts, United States
    • EMBL-EBI
      Cambridge, England, United Kingdom
    • The Scripps Research Institute
      La Jolla, California, United States
  • 2009
    • University of Oxford
      • Wellcome Trust Centre for Human Genetics
      Oxford, ENG, United Kingdom
    • Lawrence Berkeley National Laboratory
      Berkeley, California, United States
  • 2008–2009
    • Wellcome Trust Sanger Institute
      Cambridge, England, United Kingdom
  • 2003–2009
    • Cornell University
      • Department of Plant Breeding and Genetics
      Ithaca, New York, United States
  • 1994–1999
    • Whitehead Institute for Biomedical Research
      Cambridge, Massachusetts, United States
  • 1998
    • The Jackson Laboratory
      Bar Harbor, Maine, United States