Publications (9)116.84 Total impact
-
Article: The ENCODEdb portal: simplified access to ENCODE Consortium data.
[show abstract] [hide abstract]
ABSTRACT: The Encyclopedia of DNA Elements (ENCODE) project aims to identify and characterize all functional elements in a representative chromosomal sample comprising 1% of the human genome. Data generated by members of The ENCODE Project Consortium are housed in a number of public databases, such as the UCSC Genome Browser, NCBI's Gene Expression Omnibus (GEO), and EBI's ArrayExpress. As such, it is often difficult for biologists to gather all of the ENCODE data from a particular genomic region of interest and integrate them with relevant information found in other public databases. The ENCODEdb portal was developed to address this problem. ENCODEdb provides a unified, single point-of-access to data generated by the ENCODE Consortium, as well as to data from other source databases that lie within ENCODE regions; this provides the user a complete view of all known data in a particular region of interest. ENCODEdb Genomic Context searches allow for the retrieval of information on functional elements annotated within ENCODE regions, including mRNA, EST, and STS sequences; single nucleotide polymorphisms, and UniGene clusters. Information is also retrieved from GEO, OMIM, and major genome sequence browsers. ENCODEdb Consortium Data searches allow users to perform compound queries on array-based ENCODE data available both from GEO and from the UCSC Genome Browser. Results are retrieved from a specific genomic area of interest and can be further manipulated in a variety of contexts, including the UCSC Genome Browser and the Galaxy large-scale genome analysis platform. The ENCODEdb portal is freely accessible at http://research.nhgri.nih.gov/ENCODEdb.Genome Research 07/2007; 17(6):954-9. · 13.61 Impact Factor -
Article: Informatic and genomic analysis of melanocyte cDNA libraries as a resource for the study of melanocyte development and function.
[show abstract] [hide abstract]
ABSTRACT: As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.Pigment Cell Research 07/2007; 20(3):201-9. · 4.29 Impact Factor -
Article: GeneLink: a database to facilitate genetic studies of complex traits.
[show abstract] [hide abstract]
ABSTRACT: In contrast to gene-mapping studies of simple Mendelian disorders, genetic analyses of complex traits are far more challenging, and high quality data management systems are often critical to the success of these projects. To minimize the difficulties inherent in complex trait studies, we have developed GeneLink, a Web-accessible, password-protected Sybase database. GeneLink is a powerful tool for complex trait mapping, enabling genotypic data to be easily merged with pedigree and extensive phenotypic data. Specifically designed to facilitate large-scale (multi-center) genetic linkage or association studies, GeneLink securely and efficiently handles large amounts of data and provides additional features to facilitate data analysis by existing software packages and quality control. These include the ability to download chromosome-specific data files containing marker data in map order in various formats appropriate for downstream analyses (e.g., GAS and LINKAGE). Furthermore, an unlimited number of phenotypes (either qualitative or quantitative) can be stored and analyzed. Finally, GeneLink generates several quality assurance reports, including genotyping success rates of specified DNA samples or success and heterozygosity rates for specified markers. GeneLink has already proven an invaluable tool for complex trait mapping studies and is discussed primarily in the context of our large, multi-center study of hereditary prostate cancer (HPC). GeneLink is freely available at http://research.nhgri.nih.gov/genelink.BMC Genomics 11/2004; 5:81. · 4.07 Impact Factor -
Article: Short interfering RNAs can induce unexpected and divergent changes in the levels of untargeted proteins in mammalian cells.
[show abstract] [hide abstract]
ABSTRACT: RNA interference (RNAi) mediated by short interfering RNAs (siRNAs) is a widely used method to analyze gene function. To use RNAi knockdown accurately to infer gene function, it is essential to determine the specificity of siRNA-mediated RNAi. We have assessed the specificity of 10 different siRNAs corresponding to the MEN1 gene by examining the expression of two additional genes, TP53 (p53) and CDKN1A (p21), which are considered functionally unrelated to menin but are sensitive markers of cell state. MEN1 RNA and corresponding protein levels were all reduced after siRNA transfection of HeLa cells, although the degree of inhibition mediated by individual siRNAs varied. Unexpectedly, we observed dramatic and significant changes in protein levels of p53 and p21 that were unrelated to silencing of the target gene. The modulations in p53 and p21 levels were not abolished on titration of the siRNAs, and similar results were obtained in three other cell lines; in none of the cell lines tested did we see an effect on the protein levels of actin. These data suggest that siRNAs can induce nonspecific effects on protein levels that are siRNA sequence dependent but that these effects may be difficult to detect until genes central to a pivotal cellular response, such as p53 and p21, are studied. We find no evidence that activation of the double-stranded RNA-triggered IFN-associated antiviral pathways accounts for these effects, but we speculate that partial complementary sequence matches to off-target genes may result in a micro-RNA-like inhibition of translation.Proceedings of the National Academy of Sciences 03/2004; 101(7):1892-7. · 9.68 Impact Factor -
Article: GeneLink: a database to facilitate genetic studies of complex traits
[show abstract] [hide abstract]
ABSTRACT: Abstract Background In contrast to gene-mapping studies of simple Mendelian disorders, genetic analyses of complex traits are far more challenging, and high quality data management systems are often critical to the success of these projects. To minimize the difficulties inherent in complex trait studies, we have developed GeneLink, a Web-accessible, password-protected Sybase database. Results GeneLink is a powerful tool for complex trait mapping, enabling genotypic data to be easily merged with pedigree and extensive phenotypic data. Specifically designed to facilitate large-scale (multi-center) genetic linkage or association studies, GeneLink securely and efficiently handles large amounts of data and provides additional features to facilitate data analysis by existing software packages and quality control. These include the ability to download chromosome-specific data files containing marker data in map order in various formats appropriate for downstream analyses (e.g., GAS and LINKAGE). Furthermore, an unlimited number of phenotypes (either qualitative or quantitative) can be stored and analyzed. Finally, GeneLink generates several quality assurance reports, including genotyping success rates of specified DNA samples or success and heterozygosity rates for specified markers. Conclusions GeneLink has already proven an invaluable tool for complex trait mapping studies and is discussed primarily in the context of our large, multi-center study of hereditary prostate cancer (HPC). GeneLink is freely available at http://research.nhgri.nih.gov/genelink .BMC Genomics. 01/2004; -
Article: The sequence and analysis of Trypanosoma brucei chromosome II.
[show abstract] [hide abstract]
ABSTRACT: We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-coupled repair on coding versus non-coding strand. A 5-cM genetic map of the chromosome reveals recombinational 'hot' and 'cold' regions, the latter of which is predicted to include the putative centromere. One end of the chromosome consists of a 250-kb region almost exclusively composed of RHS (pseudo)genes that belong to a newly characterised multigene family containing a hot spot of insertion for retroelements. Interspersed with the RHS genes are a few copies of truncated RNA polymerase pseudogenes as well as expression site associated (pseudo)genes (ESAGs) 3 and 4, and 76 bp repeats. These features are reminiscent of a vestigial variant surface glycoprotein (VSG) gene expression site. The other end of the chromosome contains a 30-kb array of VSG genes, the majority of which are pseudogenes, suggesting that this region may be a site for modular de novo construction of VSG gene diversity during transposition/gene conversion events.Nucleic Acids Research 09/2003; 31(16):4856-63. · 8.03 Impact Factor -
Article: The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts.
[show abstract] [hide abstract]
ABSTRACT: The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of differences that could be responsible for the differences in virulence and host preference between these organisms, and indicates that phage have played a significant role in their divergence. Analysis of the B. suis genome reveals transport and metabolic capabilities akin to soil/plant-associated bacteria. Extensive gene synteny between B. suis chromosome 1 and the genome of the plant symbiont Mesorhizobium loti emphasizes the similarity between this animal pathogen and plant pathogens and symbionts. A limited repertoire of genes homologous to known bacterial virulence factors were identified.Proceedings of the National Academy of Sciences 11/2002; 99(20):13148-53. · 9.68 Impact Factor -
Article: Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis.
[show abstract] [hide abstract]
ABSTRACT: Comparison of the whole-genome sequence of Bacillus anthracis isolated from a victim of a recent bioterrorist anthrax attack with a reference reveals 60 new markers that include single nucleotide polymorphisms (SNPs), inserted or deleted sequences, and tandem repeats. Genome comparison detected four high-quality SNPs between the two sequenced B. anthracis chromosomes and seven differences among different preparations of the reference genome. These markers have been tested on a collection of anthrax isolates and were found to divide these samples into distinct families. These results demonstrate that genome-based analysis of microbial pathogens will provide a powerful new tool for investigation of infectious disease outbreaks.Science 07/2002; 296(5575):2028-33. · 31.20 Impact Factor -
Article: Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana
[show abstract] [hide abstract]
ABSTRACT: Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130–140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.Nature 12/1999; 402(6763):761-768. · 36.28 Impact Factor