Analytical methods for inferring functional effects of single base pair substitutions in human cancers.

Department of Bioinformatics, Genentech, Inc., 1 DNA Way, M.S. 93, South San Francisco, CA 94080, USA.
Human Genetics (Impact Factor: 4.63). 06/2009; 126(4):481-98. DOI: 10.1007/s00439-009-0677-y
Source: PubMed

ABSTRACT Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Author Summary This work uses empirical single nucleotide variant data from the NHLBI Exome Sequencing Project to introduce a genome-wide scoring system that ranks human genes in terms of their intolerance to standing functional genetic variation in the human population. It is often inferred that genes carrying relatively fewer or relatively more common functional variants in healthy individuals may be judged respectively more or less likely to cause certain kinds of disease. We show that this intolerance score correlates remarkably well with genes already known to cause Mendelian diseases (P<10<sup>−26</sup>). Equally striking, however, are the differences in the relationship between standing genetic variation and disease causing genes for different disease types. Considering disorder classes defined by Goh et al (2007) human disease network, we show a nearly opposite pattern for genes linked to developmental disorders and those linked to immunological disorders, with the former
    PLoS Genet. 08/2013; 9:e1003709.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Locus-specific databases (LSDBs) are curated compilations of sequence variants in genes associated with disease and have been invaluable tools for both basic and clinical research. These databases contain extensive information provided by the literature and benefit from manual curation by experts. Cancer genome sequencing projects have generated an explosion of data that are stored directly in centralized databases, thus possibly alleviating the need to develop independent LSDBs. A single cancer genome contains several thousand somatic mutations. However, only a handful of these mutations are truly oncogenic and identifying them remains a challenge. However, we can expect that this increase in data and the development of novel biocuration algorithms will ultimately result in more accurate curation and the release of stable sets of data. Using the evolution and content of the TP53 LSDB as a paradigm, it is possible to draw a model of gene mutation analysis covering initial descriptions, the accumulation and organization of knowledge in databases and the use of this knowledge in clinical practice. It is also possible to make several assumptions on the future of LSDBs and how centralized databases could change the accessibility of data, with interfaces optimized for different types of users and adapted to the specificity of each region of the genome, coding or non-coding, associated with tumor development. This article is protected by copyright. All rights reserved.
    Human Mutation 01/2014; · 5.21 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
    Science 10/2013; 342(6154):1235587. · 31.20 Impact Factor


Available from