The 1000 Genomes Project Consortium: The 1000 Genomes Project: data management and community access

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Nature Methods (Impact Factor: 32.07). 04/2012; 9(5):459-62. DOI: 10.1038/nmeth.1974
Source: PubMed


The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.

Download full-text


Available from: Eugene Kulesha,
    • "heightened security , rapid scalability , dynamic allocation of services , and flexible costing , and can in principle ease collabora - tion between dispersed located research groups by using a shared environment on a ' pay - as - you - go ' basis ( Zhao et al , 2013 ) . The 1000 Genomes Project , which catalogues human sequence variation through deep sequencing of the genomes of over 1000 individuals worldwide , uses a 200 TB Amazon cloud - based data repository solution ( Clarke et al , 2012 ) . Commercial cloud storage solutions are also provided by Google and Microsoft , and have been used by many research institutes worldwide , namely the NIH and the European Bioinformatics Institute . "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past decade, cancer research has seen an increasing trend towards high-throughput techniques and translational approaches. The increasing availability of assays that utilise smaller quantities of source material and produce higher volumes of data output have resulted in the necessity for data storage solutions beyond those previously used. Multifactorial data, both large in sample size and heterogeneous in context, needs to be integrated in a standardised, cost-effective and secure manner. This requires technical solutions and administrative support not normally financially accounted for in small- to moderate-sized research groups. In this review, we highlight the Big Data challenges faced by translational research groups in the precision medicine era; an era in which the genomes of over 75 000 patients will be sequenced by the National Health Service over the next 3 years to advance healthcare. In particular, we have looked at three main themes of data management in relation to cancer research, namely (1) cancer ontology management, (2) IT infrastructures that have been developed to support data management and (3) the unique ethical challenges introduced by utilising Big Data in research.British Journal of Cancer advance online publication 22 October 2015; doi:10.1038/bjc.2015.341
    British Journal of Cancer 10/2015; 113(10). DOI:10.1038/bjc.2015.341 · 4.84 Impact Factor
  • Source
    • "We analysed publically available cancer genome datasets at the cBioPortal for Cancer Genomics (Supplementary data) providing access to data from 20,958 tumour samples from 89 cancer studies (data available up to 21st January 2015) [22] [23] and data from the 1000 Genome project [24] [25] [26] to identify mutations, copy-number alterations and mRNA expression levels (using a mRNA expression z-score threshold value of ± 2.0). "
    [Show abstract] [Hide abstract]
    ABSTRACT: FAM72 is a novel neuronal progenitor cell (NPC) self-renewal supporting protein expressed under physiological conditions at low levels in other tissues. Accumulating data indicate the potential pivotal tumourigenic effects of FAM72. Our in silico human genome-wide analysis (GWA) revealed that the FAM72 gene family consists of four human-specific paralogous members, all of which are located on chromosome (chr) 1. Unique asymmetric FAM72 segmental gene duplications are most likely to have occurred in conjunction with the paired genomic neighbour SRGAP2 (SLIT-ROBO Rho GTPase activating protein), as both genes have four paralogues in humans but only one vertebra-emerging orthologue in all other species. No species with two or three FAM72/SRGAP2 gene pairs could be identified, and the four exclusively human-defining ohnologues, with different mutation patterns in Homo neanderthalensis and Denisova hominin, may remain under epigenetic control through long non-coding (lnc) RNAs.
    Genomics 10/2015; 106:278-285. · 2.28 Impact Factor
  • Source
    • "Other sequence detection methods are based on magnetic tweezers (Linnarsson, 2012) or on other approaches, such as nanopore sequencing analysis, in which single molecules of DNA can be deciphered as they pass through a tiny channel (Pennisi, 2012). All these techniques have facilitated whole-genome or exome sequencing at an unprecedented scale, thus allowing the launch of initiatives such as the 1000 Genomes Project, which seeks to analyze DNA variations in human populations (Clarke et al., 2012). It has been suggested that each genome contains 1.5 10 5 new single nucleotide variants (SNVs) that are not present in the dbSNP database (Pelak et al., 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: It is assumed that DNA sequences are conserved in the diverse cell types present in a multicellular organism like the human being. Thus, in order to compare the sequences in the genome of DNA from different individuals, nucleic acid is commonly isolated from a single tissue. In this regard, blood cells are widely used for this purpose because of their availability. Thus blood DNA has been used to study genetic familiar diseases that affect other tissues and organs, such as the liver, heart, and brain. While this approach is valid for the identification of familial diseases in which mutations are present in parental germinal cells and, therefore, in all the cells of a given organism, it is not suitable to identify sporadic diseases in which mutations might occur in specific somatic cells. This review addresses somatic DNA variations in different tissues or cells (mainly in the brain) of single individuals and discusses whether the dogma of DNA invariance between cell types is indeed correct. We will also discuss how single nucleotide somatic variations arise, focusing on the presence of specific DNA mutations in the brain.
    Frontiers in Aging Neuroscience 11/2014; 6:323. DOI:10.3389/fnagi.2014.00323 · 4.00 Impact Factor
Show more