[Show abstract][Hide abstract]ABSTRACT: We report on next-generation transcriptome sequencing results of three human hepatocellular carcinoma tumor/tumor-adjacent pairs. This analysis robustly examined ∼12,000 genes for both expression differences and molecular alterations. We observed 4,513 and 1,182 genes demonstrating 2-fold or greater increase or decrease in expression relative to their normal, respectively. Network analysis of expression data identified the Aurora B signaling, FOXM1 transcription factor network and Wnt signaling pathways pairs being altered in HCC. We validated as differential gene expression findings in a large data set containing of 434 liver normal/tumor sample pairs. In addition to known driver mutations in TP53 and CTNNB1, our mutation analysis identified non-synonymous mutations in genes implicated in metabolic diseases, i.e. diabetes and obesity: IRS1, HMGCS1, ATP8B1, PRMT6 and CLU, suggesting a common molecular etiology for HCC of alternative pathogenic origin.
[Show abstract][Hide abstract]ABSTRACT: The development and progression of cancer is associated with disruption of biological networks. Historically studies have identified sets of signature genes involved in events ultimately leading to the development of cancer. Identification of such sets does not indicate which biologic processes are oncogenic drivers and makes it difficult to identify key networks to target for interventions. Using a comprehensive, integrated computational approach, the authors identify the sonic hedgehog (SHH) pathway as the gene network that most significantly distinguishes tumour and tumour-adjacent samples in human hepatocellular carcinoma (HCC). The analysis reveals that the SHH pathway is commonly activated in the tumour samples and its activity most significantly differentiates tumour from the non-tumour samples. The authors experimentally validate these in silico findings in the same biologic material using Western blot analysis. This analysis reveals that the expression levels of SHH, phosphorylated cyclin B1, and CDK7 levels are much higher in most tumour tissues as compared to normal tissue. It is also shown that siRNA-mediated silencing of SHH gene expression resulted in a significant reduction of cell proliferation in a liver cancer cell line, SNU449 indicating that SHH plays a major role in promoting cell proliferation in liver cancer. The SHH pathway is a key network underpinning HCC aetiology which may guide the development of interventions for this most common form of human liver cancer.
No preview · Article · Dec 2013 · IET Systems Biology
[Show abstract][Hide abstract]ABSTRACT: To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase μ (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.
[Show abstract][Hide abstract]ABSTRACT: The PathOlogist is a new tool designed to transform large sets of gene expression data into quantitative descriptors of pathway-level behavior. The tool aims to provide a robust alternative to the search for single-gene-to-phenotype associations by accounting for the complexity of molecular interactions.
Molecular abundance data is used to calculate two metrics--'activity' and 'consistency'--for each pathway in a set of more than 500 canonical molecular pathways (source: Pathway Interaction Database, http://pid.nci.nih.gov). The tool then allows a detailed exploration of these metrics through integrated visualization of pathway components and structure, hierarchical clustering of pathways and samples, and statistical analyses designed to detect associations between pathway behavior and clinical features.
The PathOlogist provides a straightforward means to identify the functional processes, rather than individual molecules, that are altered in disease. The statistical power and biologic significance of this approach are made easily accessible to laboratory researchers and informatics analysts alike. Here we show as an example, how the PathOlogist can be used to establish pathway signatures that robustly differentiate breast cancer cell lines based on response to treatment.
Full-text · Article · May 2011 · BMC Bioinformatics
[Show abstract][Hide abstract]ABSTRACT: Table S2 shows the entire panel of subjects for the following pathway “cdc25 and chk1 regulatory pathway in response to DNA damage”. This pathway is composed of 9 genes. This table shows the copy number alterations across 145 breast cancer patient: −1 indicates deletion, 1 indicates amplification and 0 indicates of no significant change.
(0.19 MB DOC)
[Show abstract][Hide abstract]ABSTRACT: Bonferroni correction was applied on the p-values calculated using the Fisher Omnibus test in order to address the problem of multiple comparisons. The value for significance was assign to be 8.834×10−5, which is 0.05/566 (when 566 is the number of pathways). Table S1 shows all 566 pathways calculated from Chin's dataset with the p-value calculated via Fisher Omnibus test. In addition, every p-value was adjusted and pathway significance was reassigned.
(0.65 MB DOC)
[Show abstract][Hide abstract]ABSTRACT: The table details the Fisher's Omnibus value for each pathway. Columns 3 and onward give the detailed p-value obtained through the Hypergeometric function, as it has been calculated per patient, per pathway.
(1.56 MB XLS)
[Show abstract][Hide abstract]ABSTRACT: Table S3, presented here, shows all pathways that found to be significant using Kaplan-Meier survival analysis. All of the pathways presented here were found to be significantly targeted through copy number alteration using the Fisher Omnibus test (after correction). All 29 pathways were tested in two more public datasets obtain from GEO (http://www.ncbi.nlm.nih.gov/geo). A - activity, C - consistency.
(0.05 MB DOC)
[Show abstract][Hide abstract]ABSTRACT: High resolution, system-wide characterizations have demonstrated the capacity to identify genomic regions that undergo genomic aberrations. Such research efforts often aim at associating these regions with disease etiology and outcome. Identifying the corresponding biologic processes that are responsible for disease and its outcome remains challenging. Using novel analytic methods that utilize the structure of biologic networks, we are able to identify the specific networks that are highly significantly, nonrandomly altered by regions of copy number amplification observed in a systems-wide analysis. We demonstrate this method in breast cancer, where the state of a subset of the pathways identified through these regions is shown to be highly associated with disease survival and recurrence.
[Show abstract][Hide abstract]ABSTRACT: Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
[Show abstract][Hide abstract]ABSTRACT: Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present empirical tests of the method using publicly accessible genotypes, as well as analytical investigations of the method's strengths and limitations. The results reveal that the null distribution is sensitive to the underlying assumptions, making it difficult to accurately calibrate thresholds for classifying an individual as a member of the population samples. As a result, the false-positive rates obtained in practice are considerably higher than previously believed. However, despite the metric's inadequacies for identifying the presence of an individual in a sample, our results suggest potential avenues for future research on tuning this method to problems of ancestry inference or disease prediction. By revealing both the strengths and limitations of the proposed method, we hope to elucidate situations in which this distance metric may be used in an appropriate manner. We also discuss the implications of our findings in forensics applications and in the protection of GWAS participant privacy.
[Show abstract][Hide abstract]ABSTRACT: Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
[Show abstract][Hide abstract]ABSTRACT: Table S1.2, Bacterial and Viral Virtual Tag Library. Library contains 21 bp NlaIII tags closest to SacI sites from indicated bacterial or viral databases (bacterial genomes obtained from ; viral genomes obtained from ftp://ftp.ncbi.nih.gov/refseq/release/viral/) which are not present in the human genome sequence (Build 36, March 24 2008). File 2 of 3.