CEBS-Chemical Effects in Biological Systems: A public data repository integrating study design and toxicity data with microarray and proteomics data

NIEHS, National Center for Toxicogenomics, PO Box 12233, Research Triangle Park, NC 27709, USA.
Nucleic Acids Research (Impact Factor: 9.11). 02/2008; 36(Database issue):D892-900. DOI: 10.1093/nar/gkm755
Source: PubMed


CEBS (Chemical Effects in Biological Systems) is an integrated public repository for toxicogenomics data, including the study design and timeline, clinical chemistry and histopathology findings and microarray and proteomics data. CEBS contains data derived from studies of chemicals and of genetic alterations, and is compatible with clinical and environmental studies. CEBS is designed to permit the user to query the data using the study conditions, the subject responses and then, having identified an appropriate set of subjects, to move to the microarray module of CEBS to carry out gene signature and pathway analysis. Scope of CEBS: CEBS currently holds 22 studies of rats, four studies of mice and one study of Caenorhabditis elegans. CEBS can also accommodate data from studies of human subjects. Toxicogenomics studies currently in CEBS comprise over 4000 microarray hybridizations, and 75 2D gel images annotated with protein identification performed by MALDI and MS/MS. CEBS contains raw microarray data collected in accordance with MIAME guidelines and provides tools for data selection, pre-processing and analysis resulting in annotated lists of genes of interest. Additionally, clinical chemistry and histopathology findings from over 1500 animals are included in CEBS. CEBS/BID: The BID (Biomedical Investigation Database) is another component of the CEBS system. BID is a relational database used to load and curate study data prior to export to CEBS, in addition to capturing and displaying novel data types such as PCR data, or additional fields of interest, including those defined by the HESI Toxicogenomics Committee (in preparation). BID has been shared with Health Canada and the US Environmental Protection Agency. CEBS is available at BID can be accessed via the user interface from Requests for a copy of BID and for depositing data into CEBS or BID are available at

  • Source
    • "icsProject(TGP)[Ueharaetal.,2010],andDrugMatrix [Ganteretal.,2006]databases.GeneExpressionTran- scriptomicandothertoxicogenomicdatasetsareavailable throughtheComparativeToxicogenomicsDatabase (CTD)[Mattinglyetal.,2006]andChemicalEffectsin BiologicalSystems(CEBS)[Watersetal.,2008a].In July2012,CTDcontainedmanuallycurateddataon 5,99,182chemical-geneinteractions,1,76,627chemical- disease,and23,395gene-diseaserelationships,internal integrationofwhichleadsto>10.1millioninferredgene- diseaserelationshipsand9,13,622inferredchemical- diseaserelationships[Davisetal.,2013].Integrationwith GO,KEGGandreactomeprovides15.6milliontoxicoge- nomicrelationshipsforanalysis,3.6-foldand10.6-fold "
    [Show abstract] [Hide abstract]
    ABSTRACT: The human transcriptome is complex, comprising multiple transcript types, mostly in the form of non-coding RNA (ncRNA). The majority of ncRNA is of the long form (lncRNA, ≥ 200 bp), which plays an important role in gene regulation through multiple mechanisms including epigenetics, chromatin modification, control of transcription factor binding, and regulation of alternative splicing. Both mRNA and ncRNA exhibit additional variability in the form of alternative splicing and RNA editing. All aspects of the human transcriptome can potentially be dysregulated by environmental exposures. Next-generation RNA sequencing (RNA-Seq) is the best available methodology to measure this although it has limitations, including experimental bias. The third phase of the MicroArray Quality Control Consortium project (MAQC-III), also called Sequencing Quality Control (SeQC), aims to address these limitations through standardization of experimental and bioinformatic methodologies. A limited number of toxicogenomic studies have been conducted to date using RNA-Seq. This review describes the complexity of the human transcriptome, the application of transcriptomics by RNA-Seq or microarray in molecular epidemiology studies, and limitations of these approaches including the type of cell or tissue analyzed, experimental variation, and confounding. By using good study designs with precise, individual exposure measurements, sufficient power and incorporation of phenotypic anchors, studies in human populations can identify biomarkers of exposure and/or early effect and elucidate mechanisms of action underlying associated diseases, even at low doses. Analysis of datasets at the pathway level can compensate for some of the limitations of RNA-Seq and, as more datasets become available, will increasingly elucidate the exposure-disease continuum. Environ. Mol. Mutagen., 2013. © 2013 Wiley Periodicals, Inc.
    Full-text · Article · Aug 2013 · Environmental and Molecular Mutagenesis
  • Source
    • "Example concentration–response profiles and their activity calls from qHTS data generated with the NTP compound library used in Tox21 Phase I are shown in Figure 2. More extensive data can be found in NTP's Chemical Effects in Biological Systems database (Waters et al. 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: The ability of a substance to induce a toxicological response is better understood by analyzing the response profile over a broad range of concentrations than at a single concentration. In vitro quantitative high throughput screening (qHTS) assays are multiple-concentration experiments with an important role in the National Toxicology Program’s (NTP) efforts to advance toxicology from a predominantly observational science at the level of disease-specific models to a more predictive science based on broad inclusion of biological observations. Objective: We developed a systematic approach to classify substances from large-scale concentration– response data into statistically supported, toxicologically relevant activity categories. Methods: The first stage of the approach finds active substances with robust concentration–response profiles within the tested concentration range. The second stage finds substances with activity at the lowest tested concentration not captured in the first stage. The third and final stage separates statistically significant (but not robustly statistically significant) profiles from responses that lack statistically compelling support (i.e., “inactives”). The performance of the proposed algorithm was evaluated with simulated qHTS data sets. Results: The proposed approach performed well for 14-point-concentration–response curves with typical levels of residual error (σ ≤ 25%) or when maximal response (|RMAX|) was > 25% of the positive control response. The approach also worked well in most cases for smaller sample sizes when |RMAX| ≥ 50%, even with as few as four data points. Conclusions: The three-stage classification algorithm performed better than one-stage classification approaches based on overall F-tests, t-tests, or linear regression.
    Full-text · Article · May 2012 · Environmental Health Perspectives
  • Source
    • "Chemical Effects in Biological Systems (CEBS) is the first public repository which captures toxicogenomics data developed by the National Center for Toxicogenomics (NCT) within the National Institute of Environmental Health Science (NIEHS) [23,29]. A distinguishing feature of CEBS is that it contains very detailed animal-level study information including treatment protocols, study design, study time-line, metadata for microarray and proteomics data, histopathology and even raw genomic microarray results [23]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to recent advances in data storage and sharing for further data processing in predictive toxicology, there is an increasing need for flexible data representations, secure and consistent data curation and automated data quality checking. Toxicity prediction involves multidisciplinary data. There are hundreds of collections of chemical, biological and toxicological data that are widely dispersed, mostly in the open literature, professional research bodies and commercial companies. In order to better manage and make full use of such large amount of toxicity data, there is a trend to develop functionalities aiming towards data governance in predictive toxicology to formalise a set of processes to guarantee high data quality and better data management. In this paper, data quality mainly refers in a data storage sense (e.g. accuracy, completeness and integrity) and not in a toxicological sense (e.g. the quality of experimental results). This paper reviews seven widely used predictive toxicology data sources and applications, with a particular focus on their data governance aspects, including: data accuracy, data completeness, data integrity, metadata and its management, data availability and data authorisation. This review reveals the current problems (e.g. lack of systematic and standard measures of data quality) and desirable needs (e.g. better management and further use of captured metadata and the development of flexible multi-level user access authorisation schemas) of predictive toxicology data sources development. The analytical results will help to address a significant gap in toxicology data quality assessment and lead to the development of novel frameworks for predictive toxicology data and model governance. While the discussed public data sources are well developed, there nevertheless remain some gaps in the development of a data governance framework to support predictive toxicology. In this paper, data governance is identified as the new challenge in predictive toxicology, and a good use of it may provide a promising framework for developing high quality and easy accessible toxicity data repositories. This paper also identifies important research directions that require further investigation in this area.
    Full-text · Article · Jul 2011 · Journal of Cheminformatics
Show more