[Show abstract][Hide abstract] ABSTRACT: Background
The recent expansion of whole-genome sequence data available from diverse animal lineages provides an opportunity to investigate the evolutionary origins of specific classes of human disease genes. Previous studies have observed that human disease genes are of particularly ancient origin. While this suggests that many animal species have the potential to serve as feasible models for research on genes responsible for human disease, it is unclear whether this pattern has meaningful implications and whether it prevails for every class of human disease.ResultsWe used a comparative genomics approach encompassing a broad phylogenetic range of animals with sequenced genomes to determine the evolutionary patterns exhibited by human genes associated with different classes of disease. Our results support previous claims that most human disease genes are of ancient origin but, more importantly, we also demonstrate that several specific disease classes have a significantly large proportion of genes that emerged relatively recently within the metazoans and/or vertebrates. An independent assessment of the synonymous to non-synonymous substitution rates of human disease genes found in mammals reveals that disease classes that arose more recently also display unexpected rates of purifying selection between their mammalian and human counterparts.Conclusions
Our results reveal the heterogeneity underlying the evolutionary origins of (and selective pressures on) different classes of human disease genes. For example, some disease gene classes appear to be of uncommonly recent (i.e., vertebrate-specific) origin and, as a whole, have been evolving at a faster rate within mammals than the majority of disease classes having more ancient origins. The novel patterns that we have identified may provide new insight into cases where studies using traditional animal models were unable to produce results that translated to humans. Conversely, we note that the larger set of disease classes do have ancient origins, suggesting that many non-traditional animal models have the potential to be useful for studying many human disease genes. Taken together, these findings emphasize why model organism selection should be done on a disease-by-disease basis, with evolutionary profiles in mind.
[Show abstract][Hide abstract] ABSTRACT: Mnemiopsis leidyi is a ctenophore native to the coastal waters of the western Atlantic Ocean. A number of studies on Mnemiopsis have led to a better understanding of many key biological processes, and these studies have contributed to the emergence of Mnemiopsis as an important model for evolutionary and developmental studies. Recently, we sequenced, assembled, annotated, and performed a preliminary analysis on the 150-megabase genome of the ctenophore, Mnemiopsis. This sequencing effort has produced the first set of whole-genome sequencing data on any ctenophore species and is amongst the first wave of projects to sequence an animal genome de novo using solely next-generation sequencing technologies.Description: The Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis/) is intended both as a resource for obtaining genomic information on Mnemiopsis through an intuitive and easy-to-use interface and as a model for developing customized Web portals that enable access to genomic data. The scope of data available through this Portal goes well beyond the sequence data available through GenBank, providing key biological information not available elsewhere, such as pathway and protein domain analyses; it also features a customized genome browser for data visualization.
We expect that the availability of these data will allow investigators to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. The overall approach taken in the development of this Web site can serve as a viable model for disseminating data from whole-genome sequencing projects, framed in a way that best-serves the specific needs of the scientific community.
[Show abstract][Hide abstract] ABSTRACT: Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins.
Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/.
Database The Journal of Biological Databases and Curation 01/2011; 2011:bar048. DOI:10.1093/database/bar048 · 3.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe the creation of a specialized web-accessible database named the Pigment Cell Gene Resource, which contains information on the genetic pathways that regulate pigment cell development and function. This manually curated database is comprised of two sections, an annotated literature section and an interactive transcriptional network diagram. Initially, this database focuses on the transcription factor SOX10, which has essential roles in pigment cell development and function, but the database has been designed with the capacity to expand in the future, allowing inclusion of many more pigmentation genes.
Database URL: http://research.nhgri.nih.gov/pigment_cell/
Database The Journal of Biological Databases and Curation 01/2010; 2010:baq025. DOI:10.1093/database/baq025 · 3.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: HPS is an autosomal recessive disorder characterized by oculocutaneous albinism and prolonged bleeding. Eight human genes are described resulting in the HPS subtypes 1-8. Certain HPS proteins combine to form Biogenesis of Lysosome-related Organelles Complexes (BLOCs), thought to function in the formation of intracellular vesicles such as melanosomes, platelet dense bodies, and lytic granules. Specifically, BLOC-2 contains the HPS3, HPS5 and HPS6 proteins. We used phylogenetic footprinting to identify conserved regions in the upstream sequences of HPS3, HPS5 and HPS6. These conserved regions were verified to have in vitro transcription activation activity using luciferase reporter assays. Transcription factor binding site analyses of the regions identified 52 putative sites shared by all three genes. When analysis was limited to the conserved footprints, seven binding sites were found shared among all three genes: Pax-5, AIRE, CACD, ZF5, Zic1, E2F and Churchill. The HPS3 conserved upstream region was sequenced in four patients with decreased fibroblast HPS3 RNA levels and only one HPS3 mutation in the coding exons and surrounding exon/intron boundaries; no mutation was found. These findings illustrate the power of phylogenetic footprinting for identifying potential regulatory regions in non-coding sequences and define the first putative promoter elements for any HPS genes.
Annals of Human Genetics 08/2009; 73(Pt 4):422-8. DOI:10.1111/j.1469-1809.2009.00525.x · 2.21 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Homeodomain Resource is a curated collection of sequence, structure, interaction, genomic and functional information on the homeodomain family. The current version builds upon previous versions by the addition of new, complete sets of homeodomain sequences from fully sequenced genomes, the expansion of existing curated homeodomain information and the improvement of data accessibility through better search tools and more complete data integration. This release contains 1534 full-length homeodomain-containing sequences, 93 experimentally derived homeodomain structures, 101 homeodomain protein–protein interactions, 107 homeodomain DNA-binding sites and 206 homeodomain proteins implicated in human genetic disorders.
Database URL: The Homeodomain Resource is freely available and can be accessed at http://research.nhgri.nih.gov/homeodomain/
Database The Journal of Biological Databases and Curation 04/2009; 2009:bap004. DOI:10.1093/database/bap004 · 3.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Encyclopedia of DNA Elements (ENCODE) project aims to identify and characterize all functional elements in a representative chromosomal sample comprising 1% of the human genome. Data generated by members of The ENCODE Project Consortium are housed in a number of public databases, such as the UCSC Genome Browser, NCBI's Gene Expression Omnibus (GEO), and EBI's ArrayExpress. As such, it is often difficult for biologists to gather all of the ENCODE data from a particular genomic region of interest and integrate them with relevant information found in other public databases. The ENCODEdb portal was developed to address this problem. ENCODEdb provides a unified, single point-of-access to data generated by the ENCODE Consortium, as well as to data from other source databases that lie within ENCODE regions; this provides the user a complete view of all known data in a particular region of interest. ENCODEdb Genomic Context searches allow for the retrieval of information on functional elements annotated within ENCODE regions, including mRNA, EST, and STS sequences; single nucleotide polymorphisms, and UniGene clusters. Information is also retrieved from GEO, OMIM, and major genome sequence browsers. ENCODEdb Consortium Data searches allow users to perform compound queries on array-based ENCODE data available both from GEO and from the UCSC Genome Browser. Results are retrieved from a specific genomic area of interest and can be further manipulated in a variety of contexts, including the UCSC Genome Browser and the Galaxy large-scale genome analysis platform. The ENCODEdb portal is freely accessible at http://research.nhgri.nih.gov/ENCODEdb.
Genome Research 07/2007; 17(6):954-9. DOI:10.1101/gr.5582207 · 14.63 Impact Factor