The 1000 Genomes Project Consortium: The 1000 Genomes Project: data management and community access

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Nature Methods (Impact Factor: 32.07). 04/2012; 9(5):459-62. DOI: 10.1038/nmeth.1974
Source: PubMed


The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.

Download full-text


Available from: Eugene Kulesha, Oct 13, 2015
35 Reads
  • Source
    • "Other sequence detection methods are based on magnetic tweezers (Linnarsson, 2012) or on other approaches, such as nanopore sequencing analysis, in which single molecules of DNA can be deciphered as they pass through a tiny channel (Pennisi, 2012). All these techniques have facilitated whole-genome or exome sequencing at an unprecedented scale, thus allowing the launch of initiatives such as the 1000 Genomes Project, which seeks to analyze DNA variations in human populations (Clarke et al., 2012). It has been suggested that each genome contains 1.5 10 5 new single nucleotide variants (SNVs) that are not present in the dbSNP database (Pelak et al., 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: It is assumed that DNA sequences are conserved in the diverse cell types present in a multicellular organism like the human being. Thus, in order to compare the sequences in the genome of DNA from different individuals, nucleic acid is commonly isolated from a single tissue. In this regard, blood cells are widely used for this purpose because of their availability. Thus blood DNA has been used to study genetic familiar diseases that affect other tissues and organs, such as the liver, heart, and brain. While this approach is valid for the identification of familial diseases in which mutations are present in parental germinal cells and, therefore, in all the cells of a given organism, it is not suitable to identify sporadic diseases in which mutations might occur in specific somatic cells. This review addresses somatic DNA variations in different tissues or cells (mainly in the brain) of single individuals and discusses whether the dogma of DNA invariance between cell types is indeed correct. We will also discuss how single nucleotide somatic variations arise, focusing on the presence of specific DNA mutations in the brain.
    Frontiers in Aging Neuroscience 11/2014; 6:323. DOI:10.3389/fnagi.2014.00323 · 4.00 Impact Factor
  • Source
    • "The goal of our study was to develop a protocol to detect exonic CNVs (including CNVs that cover 1 to 4 exons) from exome sequencing data by combining computational prediction algorithms and a high-resolution custom CGH array. In this study, we predicted CNVs in 30 exomes obtained from the 1000 genomes project [6] using six recently published CNV detection programs along with an in-house modified algorithm. Computational CNV predictions were then tested using a custom CGH array focused on exonic regions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array. We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate. In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments.
    BMC Genomics 08/2014; 15(1):661. DOI:10.1186/1471-2164-15-661 · 3.99 Impact Factor
  • Source
    • "We used the derived frequencies of 1000 Genomes single nucleotide polymorphisms (SNP; Clarke et al., 2012; Haerty and Ponting, 2013) to test for differences in selective constraint within the human lineage between ancestral repeats (AR), constitutive 3′UTRs and the longest alternative 3′UTRs across all tissues, as described in (Chen and Rajewsky, 2006; Marques and Ponting, 2009). We considered SNPs with frequency f < 10% to be rare and intermediate if 10% < f < 90%. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Why protein-coding genes express transcripts with longer 3'untranslated regions (3'UTRs) in the brain rather than in other tissues remains poorly understood. Given the established role of 3'UTRs in post-transcriptional regulation of transcript abundance and their recently highlighted contributions to miRNA-mediated cross-talk between mRNAs, we hypothesized that 3'UTR lengthening enhances coordinated expression between functionally-related genes in the brain. To test this hypothesis, we annotated 3'UTRs of human brain-expressed genes and found that transcripts encoding ion channels or transporters are specifically enriched among those genes expressing their longest 3'UTR extension in this tissue. These 3'UTR extensions have high density of response elements predicted for those miRNAs that are specifically expressed in the human frontal cortex (FC). Importantly, these miRNA response elements are more frequently shared among ion channel/transporter-encoding mRNAs than expected by chance. This indicates that miRNA-mediated cross-talk accounts, at least in part, for the observed coordinated expression of ion channel/transporter genes in the adult human brain. We conclude that extension of these genes' 3'UTRs enhances the miRNA-mediated cross-talk among their transcripts which post-transcriptionally regulates their mRNAs' relative levels.
    Frontiers in Genetics 02/2014; 5:41. DOI:10.3389/fgene.2014.00041
Show more