Article

Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases

Groningen Bioinformatics Center, Faculty of Medical Sciences and Faculty of Mathematics and Natural Sciences, University of Groningen, Groningen, The Netherlands.
Bioinformatics (Impact Factor: 4.62). 10/2004; 20(13):2075-83. DOI: 10.1093/bioinformatics/bth206
Source: DBLP

ABSTRACT Genomic research laboratories need adequate infrastructure to support management of their data production and research workflow. But what makes infrastructure adequate? A lack of appropriate criteria makes any decision on buying or developing a system difficult. Here, we report on the decision process for the case of a molecular genetics group establishing a microarray laboratory.
Five typical requirements for experimental genomics database systems were identified: (i) evolution ability to keep up with the fast developing genomics field; (ii) a suitable data model to deal with local diversity; (iii) suitable storage of data files in the system; (iv) easy exchange with other software; and (v) low maintenance costs. The computer scientists and the researchers of the local microarray laboratory considered alternative solutions for these five requirements and chose the following options: (i) use of automatic code generation; (ii) a customized data model based on standards; (iii) storage of datasets as black boxes instead of decomposing them in database tables; (iv) loosely linking to other programs for improved flexibility; and (v) a low-maintenance web-based user interface. Our team evaluated existing microarray databases and then decided to build a new system, Molecular Genetics Information System (MOLGENIS), implemented using code generation in a period of three months. This case can provide valuable insights and lessons to both software developers and a user community embarking on large-scale genomic projects.
http://www.molgenis.nl

Download full-text

Full-text

Available from: Ritsert C Jansen, Aug 28, 2015
0 Followers
 · 
109 Views
  • Source
    • "Figure 1 outlines the ‘model-driven’ development method that several bioinformatics projects adopted in recent years to enable fast and flexible infrastructure development [1], for example Taverna and Galaxy for analysis workflows [6,7], CCPN for processing tools [8], and the early MOLGENIS for biological data management [9]. See our review [1] for a more complete overview. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new *omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed. The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS' generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This 'model-driven' method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software. In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist's satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the 'ExtractModel' procedure. The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.
    BMC Bioinformatics 12/2010; 11 Suppl 12(Suppl 12):S12. DOI:10.1186/1471-2105-11-S12-S12 · 2.67 Impact Factor
  • Source
    • "The scanned images, data, and experimental conditions were stored in the MIAME-compliant Molecular Genetics Information System (MolGenIS) [31]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In research laboratories using DNA-microarrays, usually a number of researchers perform experiments, each generating possible sources of error. There is a need for a quick and robust method to assess data quality and sources of errors in DNA-microarray experiments. To this end, a novel and cost-effective validation scheme was devised, implemented, and employed. A number of validation experiments were performed on Lactococcus lactis IL1403 amplicon-based DNA-microarrays. Using the validation scheme and ANOVA, the factors contributing to the variance in normalized DNA-microarray data were estimated. Day-to-day as well as experimenter-dependent variances were shown to contribute strongly to the variance, while dye and culturing had a relatively modest contribution to the variance. Even in cases where 90% of the data were kept for analysis and the experiments were performed under challenging conditions (e.g. on different days), the CV was at an acceptable 25%. Clustering experiments showed that trends can be reliably detected also from genes with very low expression levels. The validation scheme thus allows determining conditions that could be improved to yield even higher DNA-microarray data quality.
    BMC Genomics 02/2005; 6(1):77. DOI:10.1186/1471-2164-6-77 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: In research laboratories using DNA-microarrays, usually a number of researchers perform experiments, each generating possible sources of error. There is a need for a quick and robust method to assess data quality and sources of errors in DNA-microarray experiments. To this end, a novel and cost-effective validation scheme was devised, implemented, and employed. Results: A number of validation experiments were performed on Lactococcus lactis IL1403 amplicon-based DNA-microarrays. Using the validation scheme and ANOVA, the factors contributing to the variance in normalized DNA-microarray data were estimated. Day-to-day as well as experimenter-dependent variances were shown to contribute strongly to the variance, while dye and culturing had a relatively modest contribution to the variance. Conclusions: Even in cases where 90 % of the data were kept for analysis and the experiments were performed under challenging conditions (e.g. on different days), the CV was at an acceptable 25 %. Clustering experiments showed that trends can be reliably detected also from (very) lowly expressed genes. The validation scheme thus allows determining conditions that could be improved to yield even higher DNA-microarray data quality.
    Genome Biology 01/2005; 6(1). DOI:10.1186/gb-2005-6-4-p4 · 10.47 Impact Factor
Show more