Analytical methods for inferring functional effects of single base pair substitutions in human cancers.

Department of Bioinformatics, Genentech, Inc., 1 DNA Way, M.S. 93, South San Francisco, CA 94080, USA.
Human Genetics (Impact Factor: 4.52). 06/2009; 126(4):481-98. DOI: 10.1007/s00439-009-0677-y
Source: PubMed

ABSTRACT Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Locus-specific databases (LSDBs) are curated compilations of sequence variants in genes associated with disease and have been invaluable tools for both basic and clinical research. These databases contain extensive information provided by the literature and benefit from manual curation by experts. Cancer genome sequencing projects have generated an explosion of data that are stored directly in centralized databases, thus possibly alleviating the need to develop independent LSDBs. A single cancer genome contains several thousand somatic mutations. However, only a handful of these mutations are truly oncogenic and identifying them remains a challenge. However, we can expect that this increase in data and the development of novel biocuration algorithms will ultimately result in more accurate curation and the release of stable sets of data. Using the evolution and content of the TP53 LSDB as a paradigm, it is possible to draw a model of gene mutation analysis covering initial descriptions, the accumulation and organization of knowledge in databases and the use of this knowledge in clinical practice. It is also possible to make several assumptions on the future of LSDBs and how centralized databases could change the accessibility of data, with interfaces optimized for different types of users and adapted to the specificity of each region of the genome, coding or non-coding, associated with tumor development. This article is protected by copyright. All rights reserved.
    Human Mutation 01/2014; · 5.05 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.
    PLoS Genetics 08/2013; 9(8):e1003709. · 8.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
    Science 10/2013; 342(6154):1235587. · 31.48 Impact Factor

Preview (2 Sources)

Available from