Magellan: A Web Based System for the Integrated Analysis of Heterogeneous Biological Data and Annotations; Application to DNA Copy Number and Expression Data in Ovarian Cancer

UCSF Cancer Research Institute and Comprehensive Cancer Center, University of California, San Francisco, 2340 Sutter St., San Francisco California, USA.
Cancer informatics 02/2007; 2:10-21.
Source: PubMed

ABSTRACT Recent advances in high throughput biological methods allow researchers to generate enormous amounts of data from a single experiment. In order to extract meaningful conclusions from this tidal wave of data, it will be necessary to develop analytical methods of sufficient power and utility. It is particularly important that biologists themselves be able to perform many of these analyses, such that their background knowledge of the experimental system under study can be used to interpret results and direct further inquiries. We have developed a web-based system, Magellan, which allows the upload, storage, and analysis of multivariate data and textual or numerical annotations. Data and annotations are treated as abstract entities, to maximize the different types of information the system can store and analyze. Annotations can be used in analyses/visualizations, as a means of subsetting data to reduce dimensionality, or as a means of projecting variables from one data type or data set to another. Analytical methods are deployed within Magellan such that new functionalities can be added in a straightforward fashion. Using Magellan, we performed an integrated analysis of genome-wide comparative genomic hybridization (CGH), mRNA expression, and clinical data from ovarian tumors. Analyses included the use of permutation-based methods to identify genes whose mRNA expression levels correlated with patient survival, a nearest neighbor classifier to predict patient survival from CGH data, and curated annotations such as genomic position and derived annotations such as statistical computations to explore the quantitative relationship between CGH and mRNA expression data.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development.
    Cancer informatics 02/2006; 2:48-58.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.
    Gene regulation and systems biology 10/2007; 1:131-6.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the last decade, multiple functional genomic datasets studying chromosomal aberrations and their downstream effects on gene expression have accumulated for several cancer types. A vast majority of them are in the form of paired gene expression profiles and somatic copy number alterations (CNA) information on the same patients identified using microarray platforms. In response, many algorithms and software packages are available for integrating these paired data. Surprisingly, there has been no serious attempt to review the currently available methodologies or the novel insights brought using them. In this work, we discuss the quantitative relationships observed between CNA and gene expression in multiple cancer types and biological milestones achieved using the available methodologies. We discuss the conceptual evolution of both, the step-wise and the joint data integration methodologies over the last decade. We conclude by providing suggestions for building efficient data integration methodologies and asking further biological questions.
    Briefings in Bioinformatics 09/2011; 13(3):305-16. DOI:10.1093/bib/bbr056 · 5.92 Impact Factor

Preview (2 Sources)

Available from