-
Dong Wang,
Yuannv Zhang,
Yan Huang,
Pengfei Li,
Mingyue Wang, Ruihong Wu,
Lixin Cheng,
Wenjing Zhang,
Yujing Zhang,
Bin Li,
Chenguang Wang,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.
Gene 07/2012; 506(1):36-42. · 2.34 Impact Factor
-
Dong Wang,
Lixin Cheng,
Yuannv Zhang, Ruihong Wu,
Mingyue Wang,
Yunyan Gu,
Wenyuan Zhao,
Pengfei Li,
Bin Li,
Yujing Zhang,
Hongwei Wang,
Yan Huang,
Chenguang Wang,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: Based on the assumption that only a few genes are differentially expressed in a disease and have balanced upward and downward expression level changes, researchers usually normalise microarray data by forcing all of the arrays to have the same probe intensity distributions to remove technical variations in the data. However, accumulated evidence suggests that gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalisation assumption. Here, we show that the medians of the original probe intensities increase in most of the ten cancer types analyzed in this paper, indicating that genes may be widely up-regulated in many cancer types. Thus, at least for cancer study, normalising all arrays to have the same distribution of probe intensities regardless of the state (diseased vs. normal) tends to falsely produce many down-regulated differentially expressed (DE) genes while missing many truly up-regulated DE genes. We also show that the DE genes solely detected in the non-normalised data for cancers are highly reproducible across different datasets for the same cancers, indicating that effective biological signals naturally exist in the non-normalised data. Because the powers of current statistical analyses using the non-normalised data tend to be low, we suggest selecting DE genes in both normalised and non-normalised data and then filter out the false DE genes extracted from the normalised data that show opposite deregulation directions in the non-normalised data.
Molecular BioSystems 03/2012; 8(3):818-27. · 3.53 Impact Factor
-
Dong Wang,
Lixin Cheng,
Mingyue Wang, Ruihong Wu,
Pengfei Li,
Bin Li,
Yuannv Zhang,
Yunyan Gu,
Wenyuan Zhao,
Chenguang Wang,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: When using microarray data for studying a complex disease such as cancer, it is a common practice to normalize data to force all arrays to have the same distribution of probe intensities regardless of the biological groups of samples. The assumption underlying such normalization is that in a disease the majority of genes are not differentially expressed genes (DE genes) and the numbers of up- and down-regulated genes are roughly equal. However, accumulated evidences suggest gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalization assumption. Here, we analyzed 7 large Affymetrix datasets of pair-matched normal and cancer samples for cancers collected in the NCBI GEO database. We showed that in 6 of these 7 datasets, the medians of perfect match (PM) probe intensities increased in cancer state and the increases were significant in three datasets, suggesting the assumption that all arrays have the same median probe intensities regardless of the biological groups of samples might be misleading. Then, we evaluated the effects of three currently most widely used normalization algorithms (RMA, MAS5.0 and dChip) on the selection of DE genes by comparing them with LVS which relies less on the above-mentioned assumption. The results showed using RMA, MAS5.0 and dChip may produce lots of false results of down-regulated DE genes while missing many up-regulated DE genes. At least for cancer study, normalizing all arrays to have the same distribution of probe intensities regardless of the biological groups of samples might be misleading. Thus, most current normalizations based on unreliable assumptions may distort biological differences between normal and cancer samples. The LVS algorithm might perform relatively well due to that it relies less on the above-mentioned assumption. Also, our results indicate that genes may be widely up-regulated in most human cancer.
Computational biology and chemistry 06/2011; 35(3):126-30. · 1.37 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The biological interpretation of the complexity of cancer somatic mutation profiles is a major challenge in current cancer research. It has been suggested that mutations in multiple genes that participate in different pathways are collaborative in conferring growth advantage to tumor cells. Here, we propose a powerful pathway-based approach to study the functional collaboration of gene mutations in carcinogenesis. We successfully identify many pairs of significantly comutated pathways for a large-scale somatic mutation profile of lung adenocarcinoma. We find that the coordinated pathway pairs detected by comutations are also likely to be coaltered by other molecular changes, such as alterations in multifunctional genes in cancer. Then, we cluster comutated pathways into comutated superpathways and show that the derived superpathways also tend to be significantly coaltered by DNA copy number alterations. Our results support the hypothesis that comprehensive cooperation among a few basic functions is required for inducing cancer. The results also suggest biologically plausible models for understanding the heterogeneous mechanisms of cancers. Finally, we suggest an approach to identify candidate cancer genes from the derived comutated pathways. Together, our results provide guidelines to distill the pathway collaboration in carcinogenesis from the complexity of cancer somatic mutation profiles.Hum Mutat 32:1-8, 2011. © 2011 Wiley-Liss, Inc.
Human Mutation 05/2011; · 5.69 Impact Factor
-
Xue Gong, Ruihong Wu,
Hongwei Wang,
Xinwu Guo,
Dong Wang,
Yunyan Gu,
Yuannv Zhang,
Wenyuan Zhao,
Lixin Cheng,
Chenguang Wang,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease.
Molecular Cancer Therapeutics 03/2011; 10(5):752-60. · 5.23 Impact Factor
-
Yunyan Gu,
Da Yang,
Jinfeng Zou,
Wencai Ma, Ruihong Wu,
Wenyuan Zhao,
Yuannv Zhang,
Hui Xiao,
Xue Gong,
Min Zhang,
Jing Zhu,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism.
Molecular Cancer Therapeutics 08/2010; 9(8):2186-95. · 5.23 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.
First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.
Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
BMC Bioinformatics 02/2010; 11:76. · 2.75 Impact Factor