Molecular signatures database (MSigDB) 3.0.

Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
Bioinformatics (Impact Factor: 4.62). 06/2011; 27(12):1739-40. DOI: 10.1093/bioinformatics/btr260
Source: PubMed

ABSTRACT Well-annotated gene sets representing the universe of the biological processes are critical for meaningful and insightful interpretation of large-scale genomic data. The Molecular Signatures Database (MSigDB) is one of the most widely used repositories of such sets.
We report the availability of a new version of the database, MSigDB 3.0, with over 6700 gene sets, a complete revision of the collection of canonical pathways and experimental signatures from publications, enhanced annotations and upgrades to the web site.
MSigDB is freely available for non-commercial use at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The database of Gene Co-Regulation (dGCR) is a web tool for the analysis of gene relationships based on correlated patterns of gene expression over publicly available transcriptional data. The motivation behind dGCR is that genes whose expression patterns correlate across many experiments tend to be co-regulated and hence share biological function. In addition to revealing functional connections between individual gene pairs, extended sets of co-regulated genes can also be assessed for enrichment of gene ontology classes and interaction pathways. This functionality provides an insight into the biological function of the query gene itself. The dGCR web tool extends the range of expression data curated by existing co-regulation databases and provides additional insights into gene function through the analysis of pathways, gene ontology classes and co-regulation modules.
    Journal of genomics. 01/2015; 3:29-35.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcriptome-based biosensors are expected to have a large impact on the future of biotechnology. However, a central aspect of transcriptomics is differential expression analysis, where, currently, deep RNA sequencing (RNA-seq) has the potential to replace the microarray as the standard assay for RNA quantification. Our contributions here to RNA-seq differential expression analysis are two-fold. First, given the high cost of an RNA-seq run, biological replicates are rare, and therefore, information sharing across genes to obtain variance estimates is crucial. To handle such information sharing in a rigorous manner, we propose an hierarchical, empirical Bayes approach (R-EBSeq) that combines the Cufflinks model for generating relative transcript abundance measurements, known as FPKM (fragments per kilobase of transcript length per million mapped reads) with the EBArrays framework, which was previously developed for empirical Bayes analysis of microarray data. A desirable feature of R-EBSeq is easy-to-implement analysis of more than pairwise comparisons, as we illustrate with experimental data. Secondly, we develop the standard RNA-seq test data set, on the level of reads, where 79 transcripts are artificially differentially expressed and, therefore, explicitly known. This test data set allows us to compare the performance, in terms of the true discovery rate, of R-EBSeq to three other widely used RNAseq data analysis packages: Cuffdiff, DEseq and BaySeq. Our analysis indicates that DESeq identifies the first half of the differentially expressed transcripts well, but then is outperformed by Cuffdiff and R-EBSeq. Cuffdiff and R-EBSeq are the two top performers. Thus, R-EBSeq offers good performance, while allowing flexible and rigorous comparison of multiple biological conditions.
    Biosensors. 09/2013; 3(3):238-58.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fibrolamellar hepatocellular carcinoma (FLC) is a rare primary hepatic cancer that develops in children and young adults without cirrhosis. Little is known about its pathogenesis, and it can only be treated with surgery. We performed an integrative genomic analysis of a large series of patients with FLC to identify associated genetic factors. Using 78 clinically annotated FLC samples, we performed whole-transcriptome (n=58), single-nucleotide polymorphism array (n=41), and next-generation sequencing (n=48) analyses; we also assessed the prevalence of the DNAJB1-PRKACA fusion transcript associated with this cancer (n=73). We performed class discovery using non-negative matrix factorization, and functional annotation using gene set enrichment analyses, nearest template prediction, ingenuity pathway analyses, and immunohistochemistry. The genomic identification of significant targets in cancer algorithm was used to identify chromosomal aberrations, MuTect and VarScan2 were used to identify somatic mutations, and the random survival forest was used to determine patient prognoses. Findings were validated in an independent cohort. Unsupervised gene expression clustering revealed 3 robust molecular classes of tumors: the proliferation class (51% of samples) had altered expression of genes that regulate proliferation and mTOR signaling activation; the inflammation class (26% of samples) had altered expression of genes that regulate inflammation and cytokine production; and the unannotated class (23% of samples) had a gene expression signature not previously associated with liver tumors. Expression of genes that regulate neuroendocrine function, as well has histologic markers of cholangiocytes and hepatocytes, were detected in all 3 classes. FLCs had few copy number variations; the most frequent were focal amplification at 8q24.3 (in 12.5% of samples) and deletions at 19p13 (in 28% of samples) and 22q13.32 (in 25% of samples). The DNAJB1-PRKACA fusion transcript was detected in 79% of samples. FLC samples also contained mutations in cancer-related genes such as BRCA2 (in 4.2% of samples), which are uncommon in liver neoplasms. However, FLCs did not contain mutations most commonly detected in liver cancers. We identified an 8-gene signature that predicted survival of patients with FLC. In a genomic analysis of 78 FLC samples, we identified 3 classes based on gene expression profiles. FLCs contain mutations and chromosomal aberrations not previously associated with liver cancer, and almost 80% contain the DNAJB1-PRKACA fusion transcript. Using this information, we identified a gene signature that is associated with patient survival time. Copyright © 2014 AGA Institute. Published by Elsevier Inc. All rights reserved.
    Gastroenterology 12/2014; · 12.82 Impact Factor

Full-text (2 Sources)

Available from
May 28, 2014