[Show abstract][Hide abstract] ABSTRACT: Hepatoblastoma (HB) is the most common primary liver tumor in children. Mutations in the β-catenin gene that leads to constitutive activation of Wnt pathway have been detected in a large proportion of HB tumors. To identify novel mutations in HB, we performed whole-exome sequencing of 6 paired HB tumors and their corresponding lymphocytes. This identified 24 somatic non-synonymous mutations in 21 genes, many of which were novel, including three novel mutations targeting the CTNNB1 (G512V) and CAPRIN2 (R968H/S969C) genes in the Wnt pathway, and genes previously shown to be involved in the ubiquitin ligase complex (SPOP, KLHL22, TRPC4AP and RNF169). Functionally, both the CTNNB1 (G512V) and CAPRIN2 (R968H/S969C) were observed to be gain-of-functional mutations, and the CAPRIN2 (R968H/S969C) was also shown to activate the Wnt pathway in HB cells. These findings suggested the activation of the Wnt pathway in HB, which was confirmed by immunohistochemical staining of the β-catenin in 42 HB tumors. We further used shRNA-mediated interference to assess the effect of 21 mutated genes on HB cell survival. The results suggested that 1 novel oncogene (CAPRIN2) and 3 tumor suppressors (SPOP, OR5I1 and CDC20B) influence HB cell growth. Moreover, we found that SPOP S119N is a loss-of-function mutation in HB cells. We finally demonstrated that one of the mechanisms by which SPOP inhibits HB cell proliferation is through regulating CDKN2B expression. Conclusion: these results extend the landscape of genetic alterations in HB and highlight the dysregulation of Wnt and ubiquitin pathways in HB tumorigenesis. (Hepatology 2014;).
[Show abstract][Hide abstract] ABSTRACT: Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored.
PLoS ONE 01/2014; 9(7):e100854. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Hepatocellular carcinoma (HCC) is one of the most highly malignant and lethal cancers of the world. Its pathogenesis has been reported to be multi-factorial, and the molecular carcinogenesis of HCC can not be attributed to just a few individual genes. Based on the microRNA and mRNA expression profiling of normal liver tissues, pericancerous hepatocellular tissues and hepatocellular carcinoma tissues, we attempted to find prognosis related gene sets for HCC patients.
We identified differentially expressed genes (DEG) from three comparisons: Cancer/Normal, Cancer/Pericancerous and Pericancerous/Normal. GSEA (gene set enrichment analysis) were performed. Based on the enriched gene sets of GO terms, pathways and transcription factor targets, it was found that the genome instability and cell proliferation increased while the metabolism and differentiation decreased in HCC tissues. The expression profile of DEGs in each enriched gene set was used to correlate to the postoperative survival time of HCC patients. Nine gene sets were found to prognostic correlation. Furthermore, after substituting DEG-targeting-microRNA for DEG members of each gene set, two gene sets with the microRNA expression profiles were obtained that had prognostic potential.
The malignancy of HCC could be represented by gene sets, and pericancerous liver exhibits important characteristics of liver cancer. The expression level of gene sets not only in HCC but also in the pericancerous liver showed potential for prognosis implying an option for HCC prognosis at an early stage. Additionally, the gene-targeting-microRNA expression profiles also showed prognostic potential, demonstrating that the multi-factorial molecular pathogenesis of HCC is contributed by various genes and microRNAs.
[Show abstract][Hide abstract] ABSTRACT: Transcriptional regulatory network (TRN) is used to study conditional regulatory relationships between transcriptional factors and genes. However few studies have tried to integrate genomic variation information such as copy number variation (CNV) with TRN to find causal disturbances in a network. Intrahepatic cholangiocarcinoma (ICC) is the second most common hepatic carcinoma with high malignancy and poor prognosis. Research about ICC is relatively limited comparing to hepatocellular carcinoma, and there are no approved gene therapeutic targets yet.
PLoS ONE 01/2014; 9(6):e98653. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Current predictors for estrogen receptor-positive (ER-positive) breast cancer patients receiving tamoxifen are often invalid in inter-laboratory validation. We aim to develop a robust predictor based on the relative ordering of expression measurement (ROE) in gene pairs. Using a large integrated dataset of 420 normal controls and 1,129 ER-positive breast tumor samples, we identified the gene pairs with stable ROEs in normal control and significantly reversed ROEs in ER-positive tumor. Using these gene pairs, we characterized each sample of a cohort of 292 ER-positive patients who received tamoxifen monotherapy for 5 years and then identified relapse risk-associated gene pairs. We extracted a gene pair subset that resulted in the largest positive and negative predictive values for predicting 10-year relapse-free survival (RFS) using a genetic algorithm. A predictor was developed based on the gene pair subset and was validated in 2 large multi-laboratory cohorts (N = 250 and 248, respectively) of ER-positive patients who received 5-year tamoxifen alone. In the first validation cohort, the patients predicted to be tamoxifen sensitive had a 10-year RFS of 91 % (95 % confidence interval [CI] 85-97 %) with an absolute risk reduction of 34 % (95 % CI 17-51 %). The patients predicted to be tamoxifen insensitive had a significantly higher relapse risk than the patients predicted to be tamoxifen sensitive (hazard ratio = 4.99, 95 % CI 2.45-10.17, P = 9.13 × 10(-7)). Similar performance was achieved for the second validation cohort. The predictor performed well in both node-negative and node-positive subsets and added significant predictive power to the clinical parameters. In contrast, 2 previously proposed predictors did not achieve significantly better performances than the baselines of the validation cohorts. In summary, the proposed predictor can accurately and robustly predict tamoxifen sensitivity of ER-positive breast cancer patients and identified patients with a high probability of 10-year RFS following tamoxifen monotherapy.
Breast Cancer Research and Treatment 11/2013; · 4.47 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In microarray-based case-control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE gene identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein-protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.
[Show abstract][Hide abstract] ABSTRACT: The heterogeneity of genetic alterations in human cancer genomes presents a major challenge to advancing our understanding of cancer mechanisms and identifying cancer driver genes. To tackle this heterogeneity problem, many approaches have been proposed to investigate genetic alterations and predict driver genes at the individual pathway level. However, most of these approaches ignore the correlation of alteration events between pathways and miss many genes with rare alterations collectively contributing to carcinogenesis. Here, we devise a network-based approach to capture the cooperative functional modules hidden in genome-wide somatic mutation and copy number alteration profiles of glioblastoma (GBM) from The Cancer Genome Atlas (TCGA), where a module is a set of altered genes with dense interactions in the protein interaction network. We identify 7 pairs of significantly co-altered modules that involve the main pathways known to be altered in GBM (TP53, RB and RTK signaling pathways) and highlight the striking co-occurring alterations among these GBM pathways. By taking into account the non-random correlation of gene alterations, the property of co-alteration could distinguish oncogenic modules that contain driver genes involved in the progression of GBM. The collaboration among cancer pathways suggests that the redundant models and aggravating models could shed new light on the potential mechanisms during carcinogenesis and provide new indications for the design of cancer therapeutic strategies.
[Show abstract][Hide abstract] ABSTRACT: Directly comparing gene expression profiles of estrogen receptor-positive (ER+) and estrogen receptor-negative (ER-) breast cancers cannot determine whether differentially expressed genes between these two subtypes result from dysregulated expression in ER+ cancer or ER- cancer versus normal controls, and thus would miss critical information for elucidating the transcriptomic difference between the two subtypes.
Using microarray datasets from TCGA, we classified the genes dysregulated in both ER+ and ER- cancers versus normal controls into two classes: (i) genes dysregulated in the same direction but to a different extent, and (ii) genes dysregulated to opposite directions, and then validated the two classes in RNA-sequencing datasets of independent cohorts. We showed that the genes dysregulated to a larger extent in ER+ cancers than in ER- cancers enriched in glycerophospholipid and polysaccharide metabolic processes, while the genes dysregulated to a larger extent in ER- cancers than in ER+ cancers enriched in cell proliferation. Phosphorylase kinase and enzymes of glycosylphosphatidylinositol (GPI) anchor biosynthesis were upregulated to a larger extent in ER+ cancers than in ER- cancers, whereas glycogen synthase and phospholipase A2 were downregulated to a larger extent in ER+ cancers than in ER- cancers. We also found that the genes oppositely dysregulated in the two subtypes significantly enriched with known cancer genes and tended to closely collaborate with the cancer genes. Furthermore, we showed the possibility that these oppositely dysregulated genes could contribute to carcinogenesis of ER+ and ER- cancers through rewiring different subpathways.
GPI-anchor biosynthesis and glycogenolysis were elevated and hydrolysis of phospholipids was depleted to a larger extent in ER+ cancers than in ER- cancers. Our findings indicate that the genes oppositely dysregulated in the two subtypes are potential cancer genes which could contribute to carcinogenesis of both ER+ and ER- cancers through rewiring different subpathways.
PLoS ONE 01/2013; 8(7):e70017. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is a common practice that researchers collect a set of samples without discriminating the mutants and their wild-type counterparts to characterize the transcriptional, methylational and/or copy number changes of pre-defined candidate oncogenes or tumor suppressor genes (TSGs), although some examples are known that carcinogenic mutants may express and function completely differently from their wild-type counterparts.
Based on various high-throughput data without mutation information for typical cancer types, we surprisingly found that about half of known oncogenes (or TSGs) pre-defined by mutations were down-regulated (or up-regulated) and hypermethylated (or hypomethylated) in their corresponding cancer types. Therefore, the overall expression and/or methylation changes of genes detected in a set of samples without discriminating the mutants and their wild-type counterparts cannot indicate the carcinogenic roles of the mutants. We also found that about half of known oncogenes were located in deletion regions, whereas all known TSGs were located in deletion regions. Thus, both oncogenes and TSGs may be located in deletion regions and thus deletions can indicate TSGs only if the gene is found to be deleted as a whole. In contrast, amplifications are restricted to oncogenes and thus can be used to support either the dysregulated wild-type gene or its mutant as an oncogene.
We demonstrated that using the transcriptional, methylational and/or copy number changes without mutation information to characterize oncogenes and TSGs, which is a currently still widely adopted strategy, will most often produce misleading results. Our analysis highlights the importance of evaluating expression, methylation and copy number changes together with gene mutation data in the same set of samples in order to determine the distinct roles of the mutants and their wild-type counterparts.
PLoS ONE 01/2013; 8(3):e58163. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.
[Show abstract][Hide abstract] ABSTRACT: Based on the assumption that only a few genes are differentially expressed in a disease and have balanced upward and downward expression level changes, researchers usually normalise microarray data by forcing all of the arrays to have the same probe intensity distributions to remove technical variations in the data. However, accumulated evidence suggests that gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalisation assumption. Here, we show that the medians of the original probe intensities increase in most of the ten cancer types analyzed in this paper, indicating that genes may be widely up-regulated in many cancer types. Thus, at least for cancer study, normalising all arrays to have the same distribution of probe intensities regardless of the state (diseased vs. normal) tends to falsely produce many down-regulated differentially expressed (DE) genes while missing many truly up-regulated DE genes. We also show that the DE genes solely detected in the non-normalised data for cancers are highly reproducible across different datasets for the same cancers, indicating that effective biological signals naturally exist in the non-normalised data. Because the powers of current statistical analyses using the non-normalised data tend to be low, we suggest selecting DE genes in both normalised and non-normalised data and then filter out the false DE genes extracted from the normalised data that show opposite deregulation directions in the non-normalised data.
[Show abstract][Hide abstract] ABSTRACT: When using microarray data for studying a complex disease such as cancer, it is a common practice to normalize data to force all arrays to have the same distribution of probe intensities regardless of the biological groups of samples. The assumption underlying such normalization is that in a disease the majority of genes are not differentially expressed genes (DE genes) and the numbers of up- and down-regulated genes are roughly equal. However, accumulated evidences suggest gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalization assumption. Here, we analyzed 7 large Affymetrix datasets of pair-matched normal and cancer samples for cancers collected in the NCBI GEO database. We showed that in 6 of these 7 datasets, the medians of perfect match (PM) probe intensities increased in cancer state and the increases were significant in three datasets, suggesting the assumption that all arrays have the same median probe intensities regardless of the biological groups of samples might be misleading. Then, we evaluated the effects of three currently most widely used normalization algorithms (RMA, MAS5.0 and dChip) on the selection of DE genes by comparing them with LVS which relies less on the above-mentioned assumption. The results showed using RMA, MAS5.0 and dChip may produce lots of false results of down-regulated DE genes while missing many up-regulated DE genes. At least for cancer study, normalizing all arrays to have the same distribution of probe intensities regardless of the biological groups of samples might be misleading. Thus, most current normalizations based on unreliable assumptions may distort biological differences between normal and cancer samples. The LVS algorithm might perform relatively well due to that it relies less on the above-mentioned assumption. Also, our results indicate that genes may be widely up-regulated in most human cancer.
Computational biology and chemistry 06/2011; 35(3):126-30. · 1.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease.
Molecular Cancer Therapeutics 03/2011; 10(5):752-60. · 5.60 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism.
Molecular Cancer Therapeutics 08/2010; 9(8):2186-95. · 5.60 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.
First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.
Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
[Show abstract][Hide abstract] ABSTRACT: The causation of cancer often involves the joint deregulation of multiple biological processes. Thus, it is interesting to extract multi-function features of cancer genes and study their functional coordination involved in tumorigenesis. Here, based on Gene Ontology, we proposed a heuristics strategy to extract multi-function features which are significantly overrepresented with cancer genes. We showed that the extracted feature combinations can provide hints for studying the functional interplay roles of multi-function genes involved in tumorigenesis. Using protein-protein interaction data, we found that the pivot genes of the obtained functional feature combinations are likely to be cancer genes, and that their alternations may play important roles in simultaneously deregulating multiple functions and thus inducing cancer.
Seventh International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010, 10-12 August 2010, Yantai, Shandong, China; 01/2010
[Show abstract][Hide abstract] ABSTRACT: Although novel technologies are rapidly emerging, the cDNA microarray data accumulated is still and will be an important source for bioinformatics and biological studies. Thus, the reliability and applicability of the cDNA microarray data warrants further evaluation. In cDNA microarrays, multiple clones are measured for a transcript, which can be exploited to evaluate the consistency of microarray data. We show that even for pairs of RCs, the average Pearson correlation coefficient of their measurements is not high. However, this low consistency could largely be explained by random noise signals for a fraction of unexpressed genes and/or low signal-to-noise ratios for low abundance transcripts. Encouragingly, a large fraction of inconsistent data will be filtered out in the procedure of selecting differentially expressed genes (DEGs). Therefore, although cDNA microarray data are of low consistency, applications based on DEGs selections could still reach correct biological results, especially at the functional modules level.
Omics: a journal of integrative biology 09/2009; 13(6):493-9. · 2.29 Impact Factor