Magellan: A Web Based System for the Integrated Analysis of Heterogeneous Biological Data and Annotations; Application to DNA Copy Number and Expression Data in Ovarian Cancer

UCSF Cancer Research Institute and Comprehensive Cancer Center, University of California, San Francisco, 2340 Sutter St., San Francisco California, USA.
Cancer informatics 02/2007; 2:10-21.
Source: PubMed


Recent advances in high throughput biological methods allow researchers to generate enormous amounts of data from a single experiment. In order to extract meaningful conclusions from this tidal wave of data, it will be necessary to develop analytical methods of sufficient power and utility. It is particularly important that biologists themselves be able to perform many of these analyses, such that their background knowledge of the experimental system under study can be used to interpret results and direct further inquiries. We have developed a web-based system, Magellan, which allows the upload, storage, and analysis of multivariate data and textual or numerical annotations. Data and annotations are treated as abstract entities, to maximize the different types of information the system can store and analyze. Annotations can be used in analyses/visualizations, as a means of subsetting data to reduce dimensionality, or as a means of projecting variables from one data type or data set to another. Analytical methods are deployed within Magellan such that new functionalities can be added in a straightforward fashion. Using Magellan, we performed an integrated analysis of genome-wide comparative genomic hybridization (CGH), mRNA expression, and clinical data from ovarian tumors. Analyses included the use of permutation-based methods to identify genes whose mRNA expression levels correlated with patient survival, a nearest neighbor classifier to predict patient survival from CGH data, and curated annotations such as genomic position and derived annotations such as statistical computations to explore the quantitative relationship between CGH and mRNA expression data.

10 Reads
  • Source
    • "Soneson et al. [14] investigated the correlation between gene expression and copy number alterations using canonical correlation analysis for leukemia data. A web-based platform, called Magellan, was developed for the integrated analysis of DNA copy number and expression data in ovarian cancer [15], which found significant correlation between gene expression and patient survival. Troyanskaya et al. [16] developed a Bayesian framework to combine heterogeneous data sources to predict gene function with improved accuracy. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the clinical practice, many diseases such as glioblastoma, leukemia, diabetes, and prostates have multiple subtypes. Classifying subtypes accurately using genomic data will provide individualized treatments to target-specific disease subtypes. However, it is often difficult to obtain satisfactory classification accuracy using only one type of data, because the subtypes of a disease can exhibit similar patterns in one data type. Fortunately, multiple types of genomic data are often available due to the rapid development of genomic techniques. This raises the question on whether the classification performance can significantly be improved by combining multiple types of genomic data. In this article, we classified four subtypes of glioblastoma multiforme (GBM) with multiple types of genome-wide data (e.g., mRNA and miRNA expression) from The Cancer Genome Atlas (TCGA) project. We proposed a multi-class compressed sensing-based detector (MCSD) for this study. The MCSD was trained with data from TCGA and then applied to subtype GBM patients using an independent testing data. We performed the classification on the same patient subjects with three data types, i.e., miRNA expression data, mRNA (or gene expression) data, and their combinations. The classification accuracy is 69.1% with the miRNA expression data, 52.7% with mRNA expression data, and 90.9% with the combination of both mRNA and miRNA expression data. In addition, some biomarkers identified by the integrated approaches have been confirmed with results from the published literatures. These results indicate that the combined analysis can significantly improve the accuracy of classifying GBM subtypes and identify potential biomarkers for disease diagnosis.
    EURASIP Journal on Bioinformatics and Systems Biology 01/2013; 2013(1):2. DOI:10.1186/1687-4153-2013-2
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development.
    Cancer informatics 02/2006; 2:48-58.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Combined analysis of gene expression array data and array-based comparative genomic hybridization data have been used in a series of 26 pediatric brain tumors to define up- and downregulated genes that coincide with losses, gains, and amplifications involving specific chromosome regions. Frequent losses were defined in chromosome arms 3q, 6q, 8p, 10q, 16q, 17p, and gains were identified in chromosome 7, and chromosome arms 9p and 17q. Amplification of a 2p region was seen in only one tumor, which corresponded to increased expression of the MYCN and DDX1 genes. To facilitate the analysis of the two data sets, we have developed a custom overlay tool that defines genes that are underexpressed in regions of deletions and overexpressed in regions of gain, across the genome and specifically within regions showing recurrent involvement in medulloblastomas.
    Genes Chromosomes and Cancer 01/2007; 46(1):53-66. DOI:10.1002/gcc.20388 · 4.04 Impact Factor
Show more

Preview (2 Sources)

10 Reads
Available from