The Comparative Toxicogenomics Database (CTD)

Department of Bioinformatics, Mount Desert Island Biological Laboratory, Salsbury Cove, Maine 04672, USA.
Environmental Health Perspectives (Impact Factor: 7.98). 06/2003; 111(6):793-5. DOI: 10.1289/txg.6028
Source: PubMed


The Mount Desert Island Biological Laboratory in Salsbury Cove, Maine, USA, is developing the Comparative Toxicogenomics Database (CTD), a community-supported genomic resource devoted to genes and proteins of human toxicologic significance. CTD will be the first publicly available database to a) provide annotated associations among genes, proteins, references, and toxic agents, with a focus on annotating data from aquatic and mammalian organisms; b) include nucleotide and protein sequences from diverse species; c) offer a range of analysis tools for customized comparative studies; and d) provide information to investigators on available molecular reagents. This combination of features will facilitate cross-species comparisons of toxicologically significant genes and proteins. These comparisons will promote understanding of molecular evolution, the significance of conserved sequences, the genetic basis of variable sensitivity to environmental agents, and the complex interactions between the environment and human health. CTD is currently under development, and the planned scope and functions of the database are described herein. The intent of this report is to invite community participation in the development of CTD to ensure that it will be a valuable resource for environmental health, molecular biology, and toxicology research.

Download full-text


Available from: John N Forrest, Sep 27, 2014
13 Reads
  • Source
    • "A large amount of GeneWeaver data comes from major bioinformatics resources including NCBI, ENSEMBL and various model organism databases, including MGD (9), Rat Genome Database [RGD (27)], HUGO Gene Nomenclature Committee [HGNC (28)], Saccharomyces Genome Database [SGD (29)], FlyBase (30), WormBase (31) and the Zebrafish Model Organism Database [ZFIN (32)]. Some of these data are converted to gene sets, including GO and MP annotations, Comparative Toxicogenomics Database (33) associations and QTL positional candidates from RGD and MGI. These data sources are updated every 6 months. "
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput genome technologies have produced a wealth of data on the association of genes and gene products to biological functions. Investigators have discovered value in combining their experimental results with published genome-wide association studies, quantitative trait locus, microarray, RNA-sequencing and mutant phenotyping studies to identify gene-function associations across diverse experiments, species, conditions, behaviors or biological processes. These experimental results are typically derived from disparate data repositories, publication supplements or reconstructions from primary data stores. This leaves bench biologists with the complex and unscalable task of integrating data by identifying and gathering relevant studies, reanalyzing primary data, unifying gene identifiers and applying ad hoc computational analysis to the integrated set. The freely available GeneWeaver ( powered by the Ontological Discovery Environment is a curated repository of genomic experimental results with an accompanying tool set for dynamic integration of these data sets, enabling users to interactively address questions about sets of biological functions and their relations to sets of genes. Thus, large numbers of independently published genomic results can be organized into new conceptual frameworks driven by the underlying, inferred biological relationships rather than a pre-existing semantic framework. An empirical 'ontology' is discovered from the aggregate of experimental knowledge around user-defined areas of biological inquiry.
    Nucleic Acids Research 11/2011; 40(Database issue):D1067-76. DOI:10.1093/nar/gkr968 · 9.11 Impact Factor
  • Source
    • "Despite this observation, the mechanism of action and the potential influences of most chemicals on many diseases are not known [19,20,31,32]. To gain a better understanding about the impact environmental chemicals have on human health, the Comparative Toxicogenomics Database (CTD) [33,34] has been developed by Mount Desert Island Biological Laboratory. It serves as a unique centralised and freely available resource to explore the interactions amongst chemicals, genes or proteins and diseases in diverse species. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to recent advances in data storage and sharing for further data processing in predictive toxicology, there is an increasing need for flexible data representations, secure and consistent data curation and automated data quality checking. Toxicity prediction involves multidisciplinary data. There are hundreds of collections of chemical, biological and toxicological data that are widely dispersed, mostly in the open literature, professional research bodies and commercial companies. In order to better manage and make full use of such large amount of toxicity data, there is a trend to develop functionalities aiming towards data governance in predictive toxicology to formalise a set of processes to guarantee high data quality and better data management. In this paper, data quality mainly refers in a data storage sense (e.g. accuracy, completeness and integrity) and not in a toxicological sense (e.g. the quality of experimental results). This paper reviews seven widely used predictive toxicology data sources and applications, with a particular focus on their data governance aspects, including: data accuracy, data completeness, data integrity, metadata and its management, data availability and data authorisation. This review reveals the current problems (e.g. lack of systematic and standard measures of data quality) and desirable needs (e.g. better management and further use of captured metadata and the development of flexible multi-level user access authorisation schemas) of predictive toxicology data sources development. The analytical results will help to address a significant gap in toxicology data quality assessment and lead to the development of novel frameworks for predictive toxicology data and model governance. While the discussed public data sources are well developed, there nevertheless remain some gaps in the development of a data governance framework to support predictive toxicology. In this paper, data governance is identified as the new challenge in predictive toxicology, and a good use of it may provide a promising framework for developing high quality and easy accessible toxicity data repositories. This paper also identifies important research directions that require further investigation in this area.
    Journal of Cheminformatics 07/2011; 3(1):24. DOI:10.1186/1758-2946-3-24 · 4.55 Impact Factor
  • Source
    • "We have created a single repository called Chem2Bio2RDF by aggregating data from multiple repositories including PubChem Bioassay [28], DrugBank [29], KEGG Ligand [30], CTD [31], BindingDB [32], PharmGKB [33], MATADOR [34], and a number of small QSAR sets available on the web [35]. A schema of the data sources has been created, and the data in these sets are represented as RDF triples, that link chemical compounds (as identified by a PubChem ID) with targets, genes, side effects, diseases and publications (Figure 1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently there has been an explosion of new data sources about genes, proteins, genetic variations, chemical compounds, diseases and drugs. Integration of these data sources and the identification of patterns that go across them is of critical interest. Initiatives such as Bio2RDF and LODD have tackled the problem of linking biological data and drug data respectively using RDF. Thus far, the inclusion of chemogenomic and systems chemical biology information that crosses the domains of chemistry and biology has been very limited We have created a single repository called Chem2Bio2RDF by aggregating data from multiple chemogenomics repositories that is cross-linked into Bio2RDF and LODD. We have also created a linked-path generation tool to facilitate SPARQL query generation, and have created extended SPARQL functions to address specific chemical/biological search needs. We demonstrate the utility of Chem2Bio2RDF in investigating polypharmacology, identification of potential multiple pathway inhibitors, and the association of pathways with adverse drug reactions. We have created a new semantic systems chemical biology resource, and have demonstrated its potential usefulness in specific examples of polypharmacology, multiple pathway inhibition and adverse drug reaction--pathway mapping. We have also demonstrated the usefulness of extending SPARQL with cheminformatics and bioinformatics functionality.
    BMC Bioinformatics 05/2010; 11(1):255. DOI:10.1186/1471-2105-11-255 · 2.58 Impact Factor
Show more