The HuRef Browser: a web resource for individual human genomics

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
Nucleic Acids Research (Impact Factor: 9.11). 12/2008; 37(Database issue):D1018-24. DOI: 10.1093/nar/gkn939
Source: PubMed


The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human
individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef
assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single
nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional
annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a
better understanding of individual human genetic variation. The browser provides full access to the underlying reads with
sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms.
The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation
in a diploid context. The browser is available online at

  • Source
    • "Moreover, the cooperative integration of different genomic technologies is necessary for high-accuracy detection of variants, especially of CNVs (6,10,16). Although many genomic databases and browsers have been developed (20–24), the comparison and integration of genomic data sets from different platforms is not yet feasible. "
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput genomic technologies have been used to explore personal human genomes for the past few years. Although the integration of technologies is important for high-accuracy detection of personal genomic variations, no databases have been prepared to systematically archive genomes and to facilitate the comparison of personal genomic data sets prepared using a variety of experimental platforms. We describe here the Total Integrated Archive of Short-Read and Array (TIARA; database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA supports a user-friendly genome browser, which retrieves read-depths (RDs) and log2 ratios from NGS and CGH arrays, respectively. In addition, this database provides information on all genomic variants and the raw data, including short reads and feature-level CGH data, through anonymous file transfer protocol. More personal genomes will be archived as more individuals are analyzed by NGS or CGH array. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research.
    Full-text · Article · Nov 2010 · Nucleic Acids Research
  • Source
    • "Most known genome browsers, such as NCBI genome [1] and Craig Venter's genome browsers [2], were built for consensus sequences from multiple individuals to construct a reference human genome. Examples of haplotype genome browsers are NCBI, UCSC [3], Ensembl [4], and Venter genome browsers. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The first Korean individual diploid genome sequence data (KOREF) was publicized in December 2008. Results A Korean genome variation analysis and browsing server (Gevab) was constructed as a database and web server for the exploration and downloading of Korean personal genome(s). Information in the Gevab includes SNPs, short indels, and structural variation (SV) and comparison analysis between the NCBI human reference and the Korean genome(s). The user can find information on assembled consensus sequences, sequenced short reads, genetic variations, and relationships between genotype and phenotypes. Conclusion This server is openly and publicly available online at or directly
    Full-text · Article · Dec 2009 · BMC Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model.
    Full-text · Article · Aug 2009 · BMC Bioinformatics
Show more