-
Xue Gong,
Ruihong Wu,
Hongwei Wang,
Xinwu Guo,
Dong Wang,
Yunyan Gu,
Yuannv Zhang,
Wenyuan Zhao,
Lixin Cheng,
Chenguang Wang,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease.
Molecular Cancer Therapeutics 03/2011; 10(5):752-60. · 5.23 Impact Factor
-
Yunyan Gu,
Da Yang,
Jinfeng Zou,
Wencai Ma,
Ruihong Wu,
Wenyuan Zhao,
Yuannv Zhang,
Hui Xiao, Xue Gong,
Min Zhang,
Jing Zhu,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism.
Molecular Cancer Therapeutics 08/2010; 9(8):2186-95. · 5.23 Impact Factor
-
Jing Zhu,
Hui Xiao,
Xiaopei Shen,
Jing Wang,
Jinfeng Zou,
Lin Zhang,
Da Yang,
Wencai Ma,
Chen Yao, Xue Gong,
Min Zhang,
Yang Zhang,
Zheng Guo
[show abstract]
[hide abstract]
ABSTRACT: MOTIVATION: Studying the evolutionary conservation of cancer genes can improve our understanding of the genetic basis of human cancers. Functionally related proteins encoded by genes tend to interact with each other in a modular fashion, which may affect both the mode and tempo of their evolution. RESULTS: In the human PPI network, we searched for subnetworks within each of which all proteins have evolved at similar rates since the human and mouse split. Identified at a given co-evolving level, the subnetworks with non-randomly large sizes were defined as co-evolving modules. We showed that proteins within modules tend to be conserved, evolutionarily old and enriched with housekeeping genes, while proteins outside modules tend to be less-conserved, evolutionarily younger and enriched with genes expressed in specific tissues. Viewing cancer genes from co-evolving modules showed that the overall conservation of cancer genes should be mainly attributed to the cancer proteins enriched in the conserved modules. Functional analysis further suggested that cancer proteins within and outside modules might play different roles in carcinogenesis, providing a new hint for studying the mechanism of cancer.
Bioinformatics 02/2010; 26(7):919-24. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.
First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.
Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
BMC Bioinformatics 02/2010; 11:76. · 2.75 Impact Factor
-
Jing Zhu,
Hui Xiao,
Xiaopei Shen,
Jing Wang,
Jinfeng Zou,
Lin Zhang,
Da Yang,
Wencai Ma,
Chen Yao, Xue Gong,
Min Zhang,
Yang Zhang,
Zheng Guo
Bioinformatics. 01/2010; 26:919-924.
-
Zheng Guo,
Yongjin Li, Xue Gong,
Chen Yao,
Wencai Ma,
Dong Wang,
Yanhui Li,
Jing Zhu,
Min Zhang,
Da Yang,
Jing Wang
Bioinformatics. 01/2009; 25:1574.
-
Min Zhang,
Chen Yao,
Zheng Guo,
Jinfeng Zou,
Lin Zhang,
Hui Xiao,
Dong Wang,
Da Yang, Xue Gong,
Jing Zhu,
Yanhui Li,
Xia Li
[show abstract]
[hide abstract]
ABSTRACT: MOTIVATION: Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. RESULTS: Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.
Bioinformatics 09/2008; 24(18):2057-63. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Selecting feature genes for disease prediction is one of the most important applications of microarray technology. However, gene lists obtained in different studies for a same clinical type of patients often differ widely and have few genes in common. Recent researches suggest that gene lists ranked by fold change are more reproducible than by t-test. Here, based on the resampling method, we use training sets of different sizes to select features as top-ranked by P- value of t-test, d-value of SAM, and fold change. Then, we evaluate the stability and the disease classification power of each top ranked gene list. Our result suggests that for disease classification, gene lists selected through d-value ranking are most suitable concerning both reproducibility and classification power.
BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on; 06/2008
-
Zheng Guo,
Lei Wang,
Yongjin Li, Xue Gong,
Chen Yao,
Wencai Ma,
Dong Wang,
Yanhui Li,
Jing Zhu,
Min Zhang,
Da Yang,
Shaoqi Rao,
Jing Wang
[show abstract]
[hide abstract]
ABSTRACT: Current high-throughput protein-protein interaction (PPI) data do not provide information about the condition(s) under which the interactions occur. Thus, the identification of condition-responsive PPI sub-networks is of great importance for investigating how a living cell adapts to changing environments.
In this article, we propose a novel edge-based scoring and searching approach to extract a PPI sub-network responsive to conditions related to some investigated gene expression profiles. Using this approach, what we constructed is a sub-network connected by the selected edges (interactions), instead of only a set of vertices (proteins) as in previous works. Furthermore, we suggest a systematic approach to evaluate the biological relevance of the identified responsive sub-network by its ability of capturing condition-relevant functional modules. We apply the proposed method to analyze a human prostate cancer dataset and a yeast cell cycle dataset. The results demonstrate that the edge-based method is able to efficiently capture relevant protein interaction behaviors under the investigated conditions.
Supplementary data are available at Bioinformatics online.
Bioinformatics 09/2007; 23(16):2121-8. · 5.47 Impact Factor
-
Zheng Guo,
Yongjin Li, Xue Gong,
Chen Yao,
Wencai Ma,
Dong Wang,
Yanhui Li,
Jing Zhu,
Min Zhang,
Da Yang,
Jing Wang
Bioinformatics. 01/2007; 23:2121-2128.
-
[show abstract]
[hide abstract]
ABSTRACT: Selecting feature genes for disease prediction is one of the most important applications of microarray technology. However,gene lists obtained in different studies for a same clinical type of patients often differ widely and have few genes in common. Recent researches suggest that gene lists ranked by fold change are more reproducible than by t-test. Here,based on the resampling method, we use training sets of different sizes to select features as top-ranked by P-value of t-test, d-value of SAM, and fold change. Then,we evaluate the stability and the disease classification power of each top ranked gene list. Our result suggests that for disease classification,gene lists selected through d-value ranking are most suitable concerning both reproducibility and classification power.
BioMedical Engineering and Informatics, International Conference on. 1:265-268.