[show abstract][hide abstract] ABSTRACT: The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
Nucleic Acids Research 11/2013; · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The use of ontologies to standardize biological data and facilitate comparisons among datasets has steadily grown as the complexity and amount of available data have increased. Despite the numerous ontologies available, one area currently lacking a robust ontology is the description of vertebrate traits. A trait is defined as any measurable or observable characteristic pertaining to an organism or any of its substructures. While there are several ontologies to describe entities and processes in phenotypes, diseases, and clinical measurements, one has not been developed for vertebrate traits; the Vertebrate Trait Ontology (VT) was created to fill this void.Description: Significant inconsistencies in trait nomenclature exist in the literature, and additional difficulties arise when trait data are compared across species. The VT is a unified trait vocabulary created to aid in the transfer of data within and between species and to facilitate investigation of the genetic basis of traits. Trait information provides a valuable link between the measurements that are used to assess the trait, the phenotypes related to the traits, and the diseases associated with one or more phenotypes. Because multiple clinical and morphological measurements are often used to assess a single trait, and a single measurement can be used to assess multiple physiological processes, providing investigators with standardized annotations for trait data will allow them to investigate connections among these data types.
The annotation of genomic data with ontology terms provides unique opportunities for data mining and analysis. Links between data in disparate databases can be identified and explored, a strategy that is particularly useful for cross-species comparisons or in situations involving inconsistent terminology. The VT provides a common basis for the description of traits in multiple vertebrate species. It is being used in the Rat Genome Database and Animal QTL Database for annotation of QTL data for rat, cattle, chicken, swine, sheep, and rainbow trout, and in the Mouse Phenome Database to annotate strain characterization data. In these databases, data are also cross-referenced to applicable terms from other ontologies, providing additional avenues for data mining and analysis. The ontology is available at http://bioportal.bioontology.org/ontologies/50138.
Journal of biomedical semantics. 08/2013; 4(1):13.
[show abstract][hide abstract] ABSTRACT: The laboratory mouse is the premier animal model for studying human biology because all life stages can be accessed experimentally, a completely sequenced reference genome is publicly available and there exists a myriad of genomic tools for comparative and experimental research. In the current era of genome scale, data-driven biomedical research, the integration of genetic, genomic and biological data are essential for realizing the full potential of the mouse as an experimental model. The Mouse Genome Database (MGD; http://www.informatics.jax.org), the community model organism database for the laboratory mouse, is designed to facilitate the use of the laboratory mouse as a model system for understanding human biology and disease. To achieve this goal, MGD integrates genetic and genomic data related to the functional and phenotypic characterization of mouse genes and alleles and serves as a comprehensive catalog for mouse models of human disease. Recent enhancements to MGD include the addition of human ortholog details to mouse Gene Detail pages, the inclusion of microRNA knockouts to MGD's catalog of alleles and phenotypes, the addition of video clips to phenotype images, providing access to genotype and phenotype data associated with quantitative trait loci (QTL) and improvements to the layout and display of Gene Ontology annotations.
Nucleic Acids Research 11/2012; · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Full realization of the value of the loxP-flanked alleles generated by the International Knockout Mouse Consortium will require a large set of well-characterized cre-driver lines. However, many cre driver lines display excision activity beyond the intended tissue or cell type, and these data are frequently unavailable to the potential user. Here we describe a high-throughput pipeline to extend characterization of cre driver lines to document excision activity in a wide range of tissues at multiple time points and disseminate these data to the scientific community. Our results show that the majority of cre strains exhibit some degree of unreported recombinase activity. In addition, we observe frequent mosaicism, inconsistent activity and parent-of-origin effects. Together, these results highlight the importance of deep characterization of cre strains, and provide the scientific community with a critical resource for cre strain information.
[show abstract][hide abstract] ABSTRACT: In 2007, the International Knockout Mouse Consortium (IKMC) made the ambitious promise to generate mutations in virtually every protein-coding gene of the mouse genome in a concerted worldwide action. Now, 5 years later, the IKMC members have developed high-throughput gene trapping and, in particular, gene-targeting pipelines and generated more than 17,400 mutant murine embryonic stem (ES) cell clones and more than 1,700 mutant mouse strains, most of them conditional. A common IKMC web portal ( www.knockoutmouse.org ) has been established, allowing easy access to this unparalleled biological resource. The IKMC materials considerably enhance functional gene annotation of the mammalian genome and will have a major impact on future biomedical research.
[show abstract][hide abstract] ABSTRACT: The Mammalian Phenotype Ontology (MP) is a structured vocabulary for describing mammalian phenotypes and serves as a critical tool for efficient annotation and comprehensive retrieval of phenotype data. Importantly, the ontology contains broad and specific terms, facilitating annotation of data from initial observations or screens and detailed data from subsequent experimental research. Using the ontology structure, data are retrieved inclusively, i.e., data annotated to chosen terms and to terms subordinate in the hierarchy. Thus, searching for "abnormal craniofacial morphology" also returns annotations to "megacephaly" and "microcephaly," more specific terms in the hierarchy path. The development and refinement of the MP is ongoing, with new terms and modifications to its organization undergoing continuous assessment as users and expert reviewers propose expansions and revisions. A wealth of phenotype data on mouse mutations and variants annotated to the MP already exists in the Mouse Genome Informatics database. These data, along with data curated to the MP by many mouse mutagenesis programs and mouse repositories, provide a platform for comparative analyses and correlative discoveries. The MP provides a standard underpinning to mouse phenotype descriptions for existing and future experimental and large-scale phenotyping projects. In this review we describe the MP as it presently exists, its application to phenotype annotations, the relationship of the MP to other ontologies, and the integration of the MP within large-scale phenotyping projects. Finally we discuss future application of the MP in providing standard descriptors of the phenotype pipeline test results from the International Mouse Phenotype Consortium projects.
[show abstract][hide abstract] ABSTRACT: With the effort of the International Phenotyping Consortium to produce thousands of strains with conditional potential gathering steam, there is growing recognition that it must be supported by a rich toolbox of cre driver strains. The approaches to build cre strains have evolved in both sophistication and reliability, replacing first-generation strains with tools that can target individual cell populations with incredible precision and specificity. The modest set of cre drivers generated by individual labs over the past 15+ years is now growing rapidly, thanks to a number of large-scale projects to produce new cre strains for the community. The power of this growing resource, however, depends upon the proper deep characterization of strain function, as even the best designed strain can display a variety of undesirable features that must be considered in experimental design. This must be coupled with the parallel development of informatics tools to provide functional data to the user and facilitated access to the strains through public repositories. We discuss the current progress on all of these fronts and the challenges that remain to ensure the scientific community can capitalize on the tremendous number of mouse resources at their disposal.
[show abstract][hide abstract] ABSTRACT: Optimal curation of human diseases requires an ontology or structured vocabulary that contains terms familiar to end users, is robust enough to support multiple levels of annotation granularity, is limited to disease terms and is stable enough to avoid extensive reannotation following updates. At Mouse Genome Informatics (MGI), we currently use disease terms from Online Mendelian Inheritance in Man (OMIM) to curate mouse models of human disease. While OMIM provides highly detailed disease records that are familiar to many in the medical community, it lacks structure to support multilevel annotation. To improve disease annotation at MGI, we evaluated the merged Medical Subject Headings (MeSH) and OMIM disease vocabulary created by the Comparative Toxicogenomics Database (CTD) project. Overlaying MeSH onto OMIM provides hierarchical access to broad disease terms, a feature missing from the OMIM. We created an extended version of the vocabulary to meet the genetic disease-specific curation needs at MGI. Here we describe our evaluation of the CTD application, the extensions made by MGI and discuss the strengths and weaknesses of this approach. DATABASE URL: http://www.informatics.jax.org/
Database The Journal of Biological Databases and Curation 01/2012; 2012:bar063. · 4.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Mouse Genome Database (MGD, http://www.informatics.jax.org) is the international community resource for integrated genetic, genomic and biological data about the laboratory mouse. Data in MGD are obtained through loads from major data providers and experimental consortia, electronic submissions from laboratories and from the biomedical literature. MGD maintains a comprehensive, unified, non-redundant catalog of mouse genome features generated by distilling gene predictions from NCBI, Ensembl and VEGA. MGD serves as the authoritative source for the nomenclature of mouse genes, mutations, alleles and strains. MGD is the primary source for evidence-supported functional annotations for mouse genes and gene products using the Gene Ontology (GO). MGD provides full annotation of phenotypes and human disease associations for mouse models (genotypes) using terms from the Mammalian Phenotype Ontology and disease names from the Online Mendelian Inheritance in Man (OMIM) resource. MGD is freely accessible online through our website, where users can browse and search interactively, access data in bulk using Batch Query or BioMart, download data files or use our web services Application Programming Interface (API). Improvements to MGD include expanded genome feature classifications, inclusion of new mutant allele sets and phenotype associations and extensions of GO to include new relationships and a new stream of annotations via phylogenetic-based approaches.
Nucleic Acids Research 11/2011; 40(Database issue):D881-6. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
Nucleic Acids Research 01/2011; 39(Database issue):D7-10. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Mouse Tumor Biology Database (MTB) is designed to provide an electronic data storage, search, and analysis system for information on mouse models of human cancer. The MTB includes data on tumor frequency and latency, strain, germ line, and somatic genetics, pathologic notations, and photomicrographs. The MTB collects data from the primary literature, other public databases, and direct submissions from the scientific community. The MTB is a community resource that provides integrated access to mouse tumor data from different scientific research areas and facilitates integration of molecular, genetic, and pathologic data. Current status of MTB, search capabilities, data types, and future enhancements are described in this article.
[show abstract][hide abstract] ABSTRACT: Recent advances in high-throughput gene targeting and conditional mutagenesis are creating new and powerful resources to study the in vivo function of mammalian genes using the mouse as an experimental model. Mutant ES cells and mice are being generated at a rapid rate to study the molecular and phenotypic consequences of genetic mutations, and to correlate these study results with human disease conditions. Likewise, classical genetics approaches to identify mutations in the mouse genome that cause specific phenotypes have become more effective. Here, we describe methods to quickly obtain information on what mutant ES cells and mice are available, including recombinase driver lines for the generation of conditional mutants. Further, we describe means to access genetic and phenotypic data that identify mouse models for specific human diseases.
[show abstract][hide abstract] ABSTRACT: The Gene Expression Database (GXD) is a community resource of mouse developmental expression information. GXD integrates different types of expression data at the transcript and protein level and captures expression information from many different mouse strains and mutants. GXD places these data in the larger biological context through integration with other Mouse Genome Informatics (MGI) resources and interconnections with many other databases. Web-based query forms support simple or complex searches that take advantage of all these integrated data. The data in GXD are obtained from the literature, from individual laboratories, and from large-scale data providers. All data are annotated and reviewed by GXD curators. Since the last report, the GXD data content has increased significantly, the interface and data displays have been improved, new querying capabilities were implemented, and links to other expression resources were added. GXD is available through the MGI web site (www.informatics.jax.org), or directly at www.informatics.jax.org/expression.shtml.
Nucleic Acids Research 11/2010; 39(Database issue):D835-41. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Mouse Genome Database (MGD) is the community model organism database for the laboratory mouse and the authoritative source for phenotype and functional annotations of mouse genes. MGD includes a complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) resource. MGD contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. Major improvements to the Mouse Genome Database include comprehensive update of genetic maps, implementation of new classification terms for genome features, development of a recombinase (cre) portal and inclusion of all alleles generated by the International Knockout Mouse Consortium (IKMC).
Nucleic Acids Research 11/2010; 39(Database issue):D842-8. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.
Nucleic Acids Research 10/2010; 39(Database issue):D849-55. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The recent explosion of biological data and the concomitant proliferation of distributed databases make it challenging for biologists and bioinformaticians to discover the best data resources for their needs, and the most efficient way to access and use them. Despite a rapid acceleration in uptake of syntactic and semantic standards for interoperability, it is still difficult for users to find which databases support the standards and interfaces that they need. To solve these problems, several groups are developing registries of databases that capture key metadata describing the biological scope, utility, accessibility, ease-of-use and existence of web services allowing interoperability between resources. Here, we describe some of these initiatives including a novel formalism, the Database Description Framework, for describing database operations and functionality and encouraging good database practise. We expect such approaches will result in improved discovery, uptake and utilization of data resources. Database URL: http://www.casimir.org.uk/casimir_ddf.
Database The Journal of Biological Databases and Curation 01/2010; 2010:baq014. · 4.20 Impact Factor