Data Management in Structural Genomics: An Overview

Yeast Structural Genomics, IBBMC, Université Paris-Sud, Orsay, France.
Methods in Molecular Biology (Impact Factor: 1.29). 02/2008; 426:49-79. DOI: 10.1007/978-1-60327-058-8_4
Source: PubMed


Data management has been identified as a crucial issue in all large-scale experimental projects. In this type of project, many different persons manipulate multiple objects in different locations; thus, unless complete and accurate records are maintained, it is extremely difficult to understand exactly what has been done, when it was done, who did it, and what exact protocol was used. All of this information is essential for use in publications, reusing successful protocols, determining why a target has failed, and validating and optimizing protocols. Although data management solutions have been in place for certain focused activities (e.g., genome sequencing and microarray experiments), they are just emerging for more widespread projects, such as structural genomics, metabolomics, and systems biology as a whole. The complexity of experimental procedures, and the diversity and high rate of development of protocols used in a single center, or across various centers, have important consequences for the design of information management systems. Because procedures are carried out by both machines and hand, the system must be capable of handling data entry both from robotic systems and by means of a user-friendly interface. The information management system needs to be flexible so it can handle changes in existing protocols or newly added protocols. Because no commercial information management systems have had the needed features, most structural genomics groups have developed their own solutions. This chapter discusses the advantages of using a LIMS (laboratory information management system), for day-to-day management of structural genomics projects, and also for data mining. This chapter reviews different solutions currently in place or under development with emphasis on three systems developed by the authors: Xtrack, Sesame (developed at the Center for Eukaryotic Structural Genomics under the US Protein Structural Genomics Initiative), and HalX (developed at the Yeast Structural Genomics Laboratory, in collaboration with the European SPINE project).

  • Source
    • "Laboratory Information and Management Systems (LIMS) or Sample Management Systems (SMS) are bioinformatics tools that aid experimentalists to organize samples and experimental procedures in a controlled and annotated fashion. There are several commercial and free dedicated LIMS that have been developed specifically for genotyping labs where thousand of samples are processed by automated pipelines and procedures are tightly standardized [7-9]. One popular LIMS for genomics is BASE [10]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput sequencing assays are now routinely used to study different aspects of genome organization. As decreasing costs and widespread availability of sequencing enable more laboratories to use sequencing assays in their research projects, the number of samples and replicates in these experiments can quickly grow to several dozens of samples and thus require standardized annotation, storage and management of preprocessing steps. As a part of the STATegra project, we have developed an Experiment Management System (EMS) for high throughput omics data that supports different types of sequencing-based assays such as RNA-seq, ChIP-seq, Methyl-seq, etc, as well as proteomics and metabolomics data. The STATegra EMS provides metadata annotation of experimental design, samples and processing pipelines, as well as storage of different types of data files, from raw data to ready-to-use measurements. The system has been developed to provide research laboratories with a freely-available, integrated system that offers a simple and effective way for experiment annotation and tracking of analysis procedures.
    Full-text · Article · Mar 2014 · BMC Systems Biology
  • Source
    • "The file-based communications and data transactions become time consuming and sometimes unmanageable. Efficient data management techniques have been developed at SP centers to keep track of the enormous amount of data generated, which minimize duplication of effort and maximize the chances of success at each step (Haquin et al. 2008). Notable laboratory information management systems (LIMS) of SP centers include Sesame (Zolnai et al. 2003), HalX (Prilusky et al. 2005), PiMS (Morris et al. 2011), SPEX Db (Raymond et al. 2004) and IceDB. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A broad working definition of structural proteomics (SP) is that it is the process of the high-throughput characterization of the three-dimensional structures of biological macromolecules. Recently, the process for protein structure determination has become highly automated and SP platforms have been established around the globe, utilizing X-ray crystallography as a tool. Although protein structures often provide clues about the biological function of a target, once the three-dimensional structures have been determined, bioinformatics and proteomics-driven strategies can be employed to derive their biological activities and physiological roles. This article reviews the current status of SP methods for the structure determination pipeline, including target selection, isolation, expression, purification, crystallization, diffraction data collection, structure solution, refinement and functional annotation.
    Full-text · Article · Jun 2012
  • Source
    • "With this new field emerging, computer-aided technologies related to omics approaches are necessary for accumulating and processing experimental data. Further, handling tools are needed [18], [19], [20]. Based on the R platform, there are freely available tools to analyze omics datasets, such as the “ape” R package to visualize phylogenetic trees using genomic sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ecosystems can be conceptually thought of as interconnected environmental and metabolic systems, in which small molecules to macro-molecules interact through diverse networks. State-of-the-art technologies in post-genomic science offer ways to inspect and analyze this biomolecular web using omics-based approaches. Exploring useful genes and enzymes, as well as biomass resources responsible for anabolism and catabolism within ecosystems will contribute to a better understanding of environmental functions and their application to biotechnology. Here we present ECOMICS, a suite of web-based tools for ECosystem trans-OMICS investigation that target metagenomic, metatranscriptomic, and meta-metabolomic systems, including biomacromolecular mixtures derived from biomass. ECOMICS is made of four integrated webtools. E-class allows for the sequence-based taxonomic classification of eukaryotic and prokaryotic ribosomal data and the functional classification of selected enzymes. FT2B allows for the digital processing of NMR spectra for downstream metabolic or chemical phenotyping. Bm-Char allows for statistical assignment of specific compounds found in lignocellulose-based biomass, and HetMap is a data matrix generator and correlation calculator that can be applied to trans-omics datasets as analyzed by these and other web tools. This web suite is unique in that it allows for the monitoring of biomass metabolism in a particular environment, i.e., from macromolecular complexes (FT2DB and Bm-Char) to microbial composition and degradation (E-class), and makes possible the understanding of relationships between molecular and microbial elements (HetMap). This website is available to the public domain at:
    Full-text · Article · Feb 2012 · PLoS ONE
Show more