SDRF2GRAPH – a visualization tool of a spreadsheet-based description of experimental processes

RIKEN Omics Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa, Japan.
BMC Bioinformatics (Impact Factor: 2.58). 06/2009; 10(1):133. DOI: 10.1186/1471-2105-10-133
Source: PubMed


As larger datasets are produced with the development of genome-scale experimental techniques, it has become essential to explicitly describe the meta-data (information describing the data) generated by an experiment. The experimental process is a part of the meta-data required to interpret the produced data, and SDRF (Sample and Data Relationship Format) supports its description in a spreadsheet or tab-delimited file. This format was primarily developed to describe microarray studies in MAGE-tab, and it is being applied in a broader context in ISA-tab. While the format provides an explicit framework to describe experiments, increase of experimental steps makes it less obvious to understand the content of the SDRF files.
Here, we describe a new tool, SDRF2GRAPH, for displaying experimental steps described in an SDRF file as an investigation design graph, a directed acyclic graph representing experimental steps. A spreadsheet, in Microsoft Excel for example, which is used to edit and inspect the descriptions, can be directly input via a web-based interface without converting to tab-delimited text. This makes it much easier to organize large contents of SDRF described in multiple spreadsheets.
SDRF2GRAPH is applicable for a wide range of SDRF files for not only microarray-based analysis but also other genome-scale technologies, such as next generation sequencers. Visualization of the Investigation Design Graph (IDG) structure leads to an easy understanding of the experimental process described in the SDRF files even if the experiment is complicated, and such visualization also encourages the creation of SDRF files by providing prompt visual feedback.

9 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In FANTOM4, an international collaborative research project, we collected a wide range of genome-scale data, including 24 million mRNA 5'-reads (CAGE tags) and microarray expression profiles along a differentiation time course of the human THP-1 cell line and under 52 systematic siRNA perturbations. In addition, data regarding chromatin status derived from ChIP-chip to elucidate the transcriptional regulatory interactions are included. Here we present these data to the research community as an integrated web resource.
    Genome biology 05/2009; 10(4):R40. DOI:10.1186/gb-2009-10-4-r40 · 10.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Large systems biology projects can encompass several workgroups often located in different countries. An overview about existing data standards in systems biology and the management, storage, exchange and integration of the generated data in large distributed research projects is given, the pros and cons of the different approaches are illustrated from a practical point of view, the existing software - open source as well as commercial - and the relevant literature is extensively overview, so that the reader should be enabled to decide which data management approach is the best suited for his special needs. An emphasis is laid on the use of workflow systems and of TAB-based formats. The data in this format can be viewed and edited easily using spreadsheet programs which are familiar to the working experimental biologists. The use of workflows for the standardized access to data in either own or publicly available databanks and the standardization of operation procedures is presented. The use of ontologies and semantic web technologies for data management will be discussed in a further paper. Comment: 20 pages
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5'-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP-chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community ( Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.
    Nucleic Acids Research 11/2010; 39(Database issue):D856-60. DOI:10.1093/nar/gkq1112 · 9.11 Impact Factor
Show more


9 Reads
Available from