Yuannv Zhang

Harbin Medical University, Charbin, Heilongjiang Sheng, China

Are you Yuannv Zhang?

Claim your profile

Publications (23)43.67 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Current predictors for estrogen receptor-positive (ER-positive) breast cancer patients receiving tamoxifen are often invalid in inter-laboratory validation. We aim to develop a robust predictor based on the relative ordering of expression measurement (ROE) in gene pairs. Using a large integrated dataset of 420 normal controls and 1,129 ER-positive breast tumor samples, we identified the gene pairs with stable ROEs in normal control and significantly reversed ROEs in ER-positive tumor. Using these gene pairs, we characterized each sample of a cohort of 292 ER-positive patients who received tamoxifen monotherapy for 5 years and then identified relapse risk-associated gene pairs. We extracted a gene pair subset that resulted in the largest positive and negative predictive values for predicting 10-year relapse-free survival (RFS) using a genetic algorithm. A predictor was developed based on the gene pair subset and was validated in 2 large multi-laboratory cohorts (N = 250 and 248, respectively) of ER-positive patients who received 5-year tamoxifen alone. In the first validation cohort, the patients predicted to be tamoxifen sensitive had a 10-year RFS of 91 % (95 % confidence interval [CI] 85-97 %) with an absolute risk reduction of 34 % (95 % CI 17-51 %). The patients predicted to be tamoxifen insensitive had a significantly higher relapse risk than the patients predicted to be tamoxifen sensitive (hazard ratio = 4.99, 95 % CI 2.45-10.17, P = 9.13 × 10(-7)). Similar performance was achieved for the second validation cohort. The predictor performed well in both node-negative and node-positive subsets and added significant predictive power to the clinical parameters. In contrast, 2 previously proposed predictors did not achieve significantly better performances than the baselines of the validation cohorts. In summary, the proposed predictor can accurately and robustly predict tamoxifen sensitivity of ER-positive breast cancer patients and identified patients with a high probability of 10-year RFS following tamoxifen monotherapy.
    No preview · Article · Nov 2013 · Breast Cancer Research and Treatment
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Directly comparing gene expression profiles of estrogen receptor-positive (ER+) and estrogen receptor-negative (ER-) breast cancers cannot determine whether differentially expressed genes between these two subtypes result from dysregulated expression in ER+ cancer or ER- cancer versus normal controls, and thus would miss critical information for elucidating the transcriptomic difference between the two subtypes. Using microarray datasets from TCGA, we classified the genes dysregulated in both ER+ and ER- cancers versus normal controls into two classes: (i) genes dysregulated in the same direction but to a different extent, and (ii) genes dysregulated to opposite directions, and then validated the two classes in RNA-sequencing datasets of independent cohorts. We showed that the genes dysregulated to a larger extent in ER+ cancers than in ER- cancers enriched in glycerophospholipid and polysaccharide metabolic processes, while the genes dysregulated to a larger extent in ER- cancers than in ER+ cancers enriched in cell proliferation. Phosphorylase kinase and enzymes of glycosylphosphatidylinositol (GPI) anchor biosynthesis were upregulated to a larger extent in ER+ cancers than in ER- cancers, whereas glycogen synthase and phospholipase A2 were downregulated to a larger extent in ER+ cancers than in ER- cancers. We also found that the genes oppositely dysregulated in the two subtypes significantly enriched with known cancer genes and tended to closely collaborate with the cancer genes. Furthermore, we showed the possibility that these oppositely dysregulated genes could contribute to carcinogenesis of ER+ and ER- cancers through rewiring different subpathways. GPI-anchor biosynthesis and glycogenolysis were elevated and hydrolysis of phospholipids was depleted to a larger extent in ER+ cancers than in ER- cancers. Our findings indicate that the genes oppositely dysregulated in the two subtypes are potential cancer genes which could contribute to carcinogenesis of both ER+ and ER- cancers through rewiring different subpathways.
    Full-text · Article · Jul 2013 · PLoS ONE
  • [Show abstract] [Hide abstract]
    ABSTRACT: In microarray-based case-control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE gene identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein-protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.
    No preview · Article · May 2013 · Gene
  • Source
    Dataset: Table S6
    [Show abstract] [Hide abstract]
    ABSTRACT: The comparison between the altered frequencies of C-oncogenes (C-TSGs) and the expected altered frequencies. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Table S1
    [Show abstract] [Hide abstract]
    ABSTRACT: Directional agreement of the union of DE and DM genes across two datasets for each cancer type. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Table S4
    [Show abstract] [Hide abstract]
    ABSTRACT: All cancer genes with differential expression, methylation or copy number changes in their corresponding cancer types. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Table S3
    [Show abstract] [Hide abstract]
    ABSTRACT: Directional agreement of non-overlapping DE, DM and CNA genes in the third dataset for each cancer type. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is a common practice that researchers collect a set of samples without discriminating the mutants and their wild-type counterparts to characterize the transcriptional, methylational and/or copy number changes of pre-defined candidate oncogenes or tumor suppressor genes (TSGs), although some examples are known that carcinogenic mutants may express and function completely differently from their wild-type counterparts. Based on various high-throughput data without mutation information for typical cancer types, we surprisingly found that about half of known oncogenes (or TSGs) pre-defined by mutations were down-regulated (or up-regulated) and hypermethylated (or hypomethylated) in their corresponding cancer types. Therefore, the overall expression and/or methylation changes of genes detected in a set of samples without discriminating the mutants and their wild-type counterparts cannot indicate the carcinogenic roles of the mutants. We also found that about half of known oncogenes were located in deletion regions, whereas all known TSGs were located in deletion regions. Thus, both oncogenes and TSGs may be located in deletion regions and thus deletions can indicate TSGs only if the gene is found to be deleted as a whole. In contrast, amplifications are restricted to oncogenes and thus can be used to support either the dysregulated wild-type gene or its mutant as an oncogene. We demonstrated that using the transcriptional, methylational and/or copy number changes without mutation information to characterize oncogenes and TSGs, which is a currently still widely adopted strategy, will most often produce misleading results. Our analysis highlights the importance of evaluating expression, methylation and copy number changes together with gene mutation data in the same set of samples in order to determine the distinct roles of the mutants and their wild-type counterparts.
    Preview · Article · Mar 2013 · PLoS ONE
  • Source
    Dataset: Table S7
    [Show abstract] [Hide abstract]
    ABSTRACT: C-oncogenes, c-TSGs and stability c-TSGs for nine cancer types. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Table S5
    [Show abstract] [Hide abstract]
    ABSTRACT: Down-regulated or hypermethylated c-oncogenes involved in cell differentiation. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Table S8
    [Show abstract] [Hide abstract]
    ABSTRACT: Expression, methylation and copy number datasets used in this study. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    Dataset: Table S2
    [Show abstract] [Hide abstract]
    ABSTRACT: Directional agreement of the DE and DM genes, which were significantly changed in one dataset and at least marginally significantly changed in another dataset, across two datasets for each cancer type. (XLS)
    Preview · Dataset · Mar 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The heterogeneity of genetic alterations in human cancer genomes presents a major challenge to advancing our understanding of cancer mechanisms and identifying cancer driver genes. To tackle this heterogeneity problem, many approaches have been proposed to investigate genetic alterations and predict driver genes at the individual pathway level. However, most of these approaches ignore the correlation of alteration events between pathways and miss many genes with rare alterations collectively contributing to carcinogenesis. Here, we devise a network-based approach to capture the cooperative functional modules hidden in genome-wide somatic mutation and copy number alteration profiles of glioblastoma (GBM) from The Cancer Genome Atlas (TCGA), where a module is a set of altered genes with dense interactions in the protein interaction network. We identify 7 pairs of significantly co-altered modules that involve the main pathways known to be altered in GBM (TP53, RB and RTK signaling pathways) and highlight the striking co-occurring alterations among these GBM pathways. By taking into account the non-random correlation of gene alterations, the property of co-alteration could distinguish oncogenic modules that contain driver genes involved in the progression of GBM. The collaboration among cancer pathways suggests that the redundant models and aggravating models could shed new light on the potential mechanisms during carcinogenesis and provide new indications for the design of cancer therapeutic strategies.
    Full-text · Article · Jan 2013 · Molecular BioSystems
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.
    Full-text · Article · Jul 2012 · Gene
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Based on the assumption that only a few genes are differentially expressed in a disease and have balanced upward and downward expression level changes, researchers usually normalise microarray data by forcing all of the arrays to have the same probe intensity distributions to remove technical variations in the data. However, accumulated evidence suggests that gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalisation assumption. Here, we show that the medians of the original probe intensities increase in most of the ten cancer types analyzed in this paper, indicating that genes may be widely up-regulated in many cancer types. Thus, at least for cancer study, normalising all arrays to have the same distribution of probe intensities regardless of the state (diseased vs. normal) tends to falsely produce many down-regulated differentially expressed (DE) genes while missing many truly up-regulated DE genes. We also show that the DE genes solely detected in the non-normalised data for cancers are highly reproducible across different datasets for the same cancers, indicating that effective biological signals naturally exist in the non-normalised data. Because the powers of current statistical analyses using the non-normalised data tend to be low, we suggest selecting DE genes in both normalised and non-normalised data and then filter out the false DE genes extracted from the normalised data that show opposite deregulation directions in the non-normalised data.
    Full-text · Article · Mar 2012 · Molecular BioSystems
  • [Show abstract] [Hide abstract]
    ABSTRACT: The biological interpretation of the complexity of cancer somatic mutation profiles is a major challenge in current cancer research. It has been suggested that mutations in multiple genes that participate in different pathways are collaborative in conferring growth advantage to tumor cells. Here, we propose a powerful pathway-based approach to study the functional collaboration of gene mutations in carcinogenesis. We successfully identify many pairs of significantly comutated pathways for a large-scale somatic mutation profile of lung adenocarcinoma. We find that the coordinated pathway pairs detected by comutations are also likely to be coaltered by other molecular changes, such as alterations in multifunctional genes in cancer. Then, we cluster comutated pathways into comutated superpathways and show that the derived superpathways also tend to be significantly coaltered by DNA copy number alterations. Our results support the hypothesis that comprehensive cooperation among a few basic functions is required for inducing cancer. The results also suggest biologically plausible models for understanding the heterogeneous mechanisms of cancers. Finally, we suggest an approach to identify candidate cancer genes from the derived comutated pathways. Together, our results provide guidelines to distill the pathway collaboration in carcinogenesis from the complexity of cancer somatic mutation profiles.Hum Mutat 32:1-8, 2011. © 2011 Wiley-Liss, Inc.
    No preview · Article · Sep 2011 · Human Mutation
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When using microarray data for studying a complex disease such as cancer, it is a common practice to normalize data to force all arrays to have the same distribution of probe intensities regardless of the biological groups of samples. The assumption underlying such normalization is that in a disease the majority of genes are not differentially expressed genes (DE genes) and the numbers of up- and down-regulated genes are roughly equal. However, accumulated evidences suggest gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalization assumption. Here, we analyzed 7 large Affymetrix datasets of pair-matched normal and cancer samples for cancers collected in the NCBI GEO database. We showed that in 6 of these 7 datasets, the medians of perfect match (PM) probe intensities increased in cancer state and the increases were significant in three datasets, suggesting the assumption that all arrays have the same median probe intensities regardless of the biological groups of samples might be misleading. Then, we evaluated the effects of three currently most widely used normalization algorithms (RMA, MAS5.0 and dChip) on the selection of DE genes by comparing them with LVS which relies less on the above-mentioned assumption. The results showed using RMA, MAS5.0 and dChip may produce lots of false results of down-regulated DE genes while missing many up-regulated DE genes. At least for cancer study, normalizing all arrays to have the same distribution of probe intensities regardless of the biological groups of samples might be misleading. Thus, most current normalizations based on unreliable assumptions may distort biological differences between normal and cancer samples. The LVS algorithm might perform relatively well due to that it relies less on the above-mentioned assumption. Also, our results indicate that genes may be widely up-regulated in most human cancer.
    Full-text · Article · Jun 2011 · Computational biology and chemistry
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease.
    Full-text · Article · Mar 2011 · Molecular Cancer Therapeutics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism.
    Full-text · Article · Aug 2010 · Molecular Cancer Therapeutics
  • [Show abstract] [Hide abstract]
    ABSTRACT: The causation of cancer often involves the joint deregulation of multiple biological processes. Thus, it is interesting to extract multi-function features of cancer genes and study their functional coordination involved in tumorigenesis. Here, based on Gene Ontology, we proposed a heuristics strategy to extract multi-function features which are significantly overrepresented with cancer genes. We showed that the extracted feature combinations can provide hints for studying the functional interplay roles of multi-function genes involved in tumorigenesis. Using protein-protein interaction data, we found that the pivot genes of the obtained functional feature combinations are likely to be cancer genes, and that their alternations may play important roles in simultaneously deregulating multiple functions and thus inducing cancer.
    No preview · Conference Paper · Aug 2010