MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database

Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA.
Database The Journal of Biological Databases and Curation (Impact Factor: 3.37). 01/2012; 2012:bar065. DOI: 10.1093/database/bar065
Source: PubMed


The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the scientific literature. The CTD curation paradigm uses controlled vocabularies for chemicals, genes and diseases. To curate disease information, CTD first had to identify a source of controlled terms. Two resources seemed to be good candidates: the Online Mendelian Inheritance in Man (OMIM) and the ‘Diseases’ branch of the National Library of Medicine's Medical Subject Headers (MeSH). To maximize the advantages of both, CTD biocurators undertook a novel initiative to map the flat list of OMIM disease terms into the hierarchical nature of the MeSH vocabulary. The result is CTD’s ‘merged disease vocabulary’ (MEDIC), a unique resource that integrates OMIM terms, synonyms and identifiers with MeSH terms, synonyms, definitions, identifiers and hierarchical relationships. MEDIC is both a deep and broad vocabulary, composed of 9700 unique diseases described by more than 67 000 terms (including synonyms). It is freely available to download in various formats from CTD. While neither a true ontology nor a perfect solution, this vocabulary has nonetheless proved to be extremely successful and practical for our biocurators in generating over 2.5 million disease-associated toxicogenomic relationships in CTD. Other external databases have also begun to adopt MEDIC for their disease vocabulary. Here, we describe the construction, implementation, maintenance and use of MEDIC to raise awareness of this resource and to offer it as a putative scaffold in the formal construction of an official disease ontology.
Database URL:

Download full-text


Available from: Allan Peter Davis, Jul 20, 2015
16 Reads
  • Source
    • "1) Improving the similarity graph: In this experimental phase, only some of the accessible data were used (learning_outcome, grouped_outcome, primary_index, secondary_index ). Further development of this approach will be based on incorporating data like MeSH [4] terms. There is also an opportunity to experiment with weights: terms contained Fig. 3. Detail of a given community of disciplines. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This contribution demonstrates how to apply concepts of social network analysis on educational data. The main aim of this approach is to provide a deeper insight into the structure of courses and/or other learning units that belong to a given curriculum in order to improve the learning process. Presented work can help us to discover communities of similar study disciplines (based on the similarity measures of textual descriptions of their contents), as well as identify important courses strongly linked to others, and also find more independent and less important parts of the curriculum using centrality measures arising from the graph theory and social network analysis.
    5th International Workshop on Artificial Intelligence in Medical Applications, Lodz, Poland; 09/2015
    • "The pathway-gene relationships for enrichment analysis are based on KEGG [24] and REACTOME [25] pathway databases. The analysis of enriched diseases utilize MEDIC disease vocabulary [26], a combination of Medical Subject Headings (MeSH) [27] and Online Mendelian Inheritance in Man (OMIM) [28] databases. Several web Table 1 Top ranking proteins interacted with maleic acid. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Maleic acid is a multi-functional chemical widely applied in the manufacturing of polymer products including food packaging. However, the contamination of maleic acid in modified starch has raised the concerns about the effects of chronic exposure to maleic acid on human health. This study proposed a novel toxicogenomics approach for inferring functions, pathways and diseases potentially affected by maleic acid on humans by using known interactions between maleic acid and proteins. Neuronal signal transmission and cell metabolism were identified to be most influenced by maleic acid in this study. The top disease categories inferred to be associated with maleic acid were mental disorder, nervous system diseases, cardiovascular diseases, and cancers. The results from an in silico analysis showed that maleic acid could penetrate the blood-brain barrier to affect the nervous system. Several functions and pathways were further analyzed and identified to give insights into the mechanisms of maleic acid-associated diseases. The toxicogenomics approach may offer both a better understanding of the potential risks of maleic-acid exposure to humans and a direction for future toxicological investigation.
    Chemico-Biological Interactions 09/2014; 223. DOI:10.1016/j.cbi.2014.09.004 · 2.58 Impact Factor
  • Source
    • "The user has to provide the presumptive (or known) Mendelian disorder associated to the sample, the mode of inheritance and the platform used for exome target enrichment. The disease has to be chosen using a fixed vocabulary implementing the MEDIC hierarchical disease ontology [23] including all child terms to MeSH ID D009358: "Congenital, Hereditary, and Neonatal Diseases and Abnormalities". The disease list can be searched by directly typing the specific OMIM ID [1] or a keyword and the auto-completion function will automatically retreive all the available matching terms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Mendelian disorders are mostly caused by single mutations in the DNA sequence of a gene, leading to a phenotype with pathologic consequences. Whole Exome Sequencing of patients can be a cost-effective alternative to standard genetic screenings to find causative mutations of genetic diseases, especially when the number of cases is limited. Analyzing exome sequencing data requires specific expertise, high computational resources and a reference variant database to identify pathogenic variants. Results We developed a database of variations collected from patients with Mendelian disorders, which is automatically populated thanks to an associated exome-sequencing pipeline. The pipeline is able to automatically identify, annotate and store insertions, deletions and mutations in the database. The resource is freely available online The exome sequencing pipeline automates the analysis workflow (quality control and read trimming, mapping on reference genome, post-alignment processing, variation calling and annotation) using state-of-the-art software tools. The exome-sequencing pipeline has been designed to run on a computing cluster in order to analyse several samples simultaneously. The detected variants are annotated by the pipeline not only with the standard variant annotations (e.g. allele frequency in the general population, the predicted effect on gene product activity, etc.) but, more importantly, with allele frequencies across samples progressively collected in the database itself, stratified by Mendelian disorder. Conclusions We aim at providing a resource for the genetic disease community to automatically analyse whole exome-sequencing samples with a standard and uniform analysis pipeline, thus collecting variant allele frequencies by disorder. This resource may become a valuable tool to help dissecting the genotype underlying the disease phenotype through an improved selection of putative patient-specific causative or phenotype-associated variations.
    BMC Genomics 05/2014; 15(Suppl 3):S5. DOI:10.1186/1471-2164-15-S3-S5 · 3.99 Impact Factor
Show more