Mayday--integrative analytics for expression data.

Center for Bioinformatics Tübingen, University of Tübingen, Sand 14, 72076 Tübingen, Germany.
BMC Bioinformatics (Impact Factor: 3.02). 03/2010; 11:121. DOI: 10.1186/1471-2105-11-121
Source: PubMed

ABSTRACT DNA Microarrays have become the standard method for large scale analyses of gene expression and epigenomics. The increasing complexity and inherent noisiness of the generated data makes visual data exploration ever more important. Fast deployment of new methods as well as a combination of predefined, easy to apply methods with programmer's access to the data are important requirements for any analysis framework. Mayday is an open source platform with emphasis on visual data exploration and analysis. Many built-in methods for clustering, machine learning and classification are provided for dissecting complex datasets. Plugins can easily be written to extend Mayday's functionality in a large number of ways. As Java program, Mayday is platform-independent and can be used as Java WebStart application without any installation. Mayday can import data from several file formats, database connectivity is included for efficient data organization. Numerous interactive visualization tools, including box plots, profile plots, principal component plots and a heatmap are available, can be enhanced with metadata and exported as publication quality vector files.
We have rewritten large parts of Mayday's core to make it more efficient and ready for future developments. Among the large number of new plugins are an automated processing framework, dynamic filtering, new and efficient clustering methods, a machine learning module and database connectivity. Extensive manual data analysis can be done using an inbuilt R terminal and an integrated SQL querying interface. Our visualization framework has become more powerful, new plot types have been added and existing plots improved.
We present a major extension of Mayday, a very versatile open-source framework for efficient micro array data analysis designed for biologists and bioinformaticians. Most everyday tasks are already covered. The large number of available plugins as well as the extension possibilities using compiled plugins and ad-hoc scripting allow for the rapid adaption of Mayday also to very specialized data exploration. Mayday is available at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system and which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21.
    Data Mining and Knowledge Discovery 06/2012; 27(1). · 2.88 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: GlnK is an important nitrogen sensor protein in Streptomyces coelicolor. Deletion of glnK results in a medium-dependent failure of aerial mycelium and spore formation and loss of antibiotic production. Thus, GlnK is not only a regulator of nitrogen metabolism but also of morphological differentiation and secondary metabolite production. Through a comparative transcriptomic approach between the S. coelicolor wild-type and a S. coelicolor glnK mutant strain, 142 genes were identified that are differentially regulated in both strains. Among these are genes of the ram and rag operon, which are involved in S. coelicolor morphogenesis, as well as genes involved in gas vesicle biosynthesis and ectoine biosynthesis. Surprisingly, no relevant nitrogen genes were found to be differentially regulated, revealing that GlnK is not an important nitrogen sensor under the tested conditions.
    Applied Microbiology and Biotechnology 12/2011; 92(6):1219-36. · 3.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Fungi produce a variety of carbohydrate activity enzymes (CAZymes) for the degradation of plant polysaccharide materials to facilitate infection and/or gain nutrition. Identifying and comparing CAZymes from fungi with different nutritional modes or infection mechanisms may provide information for better understanding of their life styles and infection models. To date, over hundreds of fungal genomes are publicly available. However, a systematic comparative analysis of fungal CAZymes across the entire fungal kingdom has not been reported. RESULTS: In this study, we systemically identified glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and glycosyltransferases (GTs) as well as carbohydrate-binding modules (CBMs) in the predicted proteomes of 103 representative fungi from Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota. Comparative analysis of these CAZymes that play major roles in plant polysaccharide degradation revealed that fungi exhibit tremendous diversity in the number and variety of CAZymes. Among them, some families of GHs and CEs are the most prevalent CAZymes that are distributed in all of the fungi analyzed. Importantly, cellulases of some GH families are present in fungi that are not known to have cellulose-degrading ability. In addition, our results also showed that in general, plant pathogenic fungi have the highest number of CAZymes. Biotrophic fungi tend to have fewer CAZymes than necrotrophic and hemibiotrophic fungi. Pathogens of dicots often contain more pectinases than fungi infecting monocots. Interestingly, besides yeasts, many saprophytic fungi that are highly active in degrading plant biomass contain fewer CAZymes than plant pathogenic fungi. Furthermore, analysis of the gene expression profile of the wheat scab fungus Fusarium graminearum revealed that most of the CAZyme genes related to cell wall degradation were up-regulated during plant infection. Phylogenetic analysis also revealed a complex history of lineage-specific expansions and attritions for the PL1 family. CONCLUSIONS: Our study provides insights into the variety and expansion of fungal CAZyme classes and revealed the relationship of CAZyme size and diversity with their nutritional strategy and host specificity.
    BMC Genomics 04/2013; 14(1):274. · 4.40 Impact Factor

Full-text (2 Sources)

Available from
May 21, 2014