A new type of stochastic dependence revealed in gene expression data.

Department of Probability and Statistics, Charles University.
Statistical Applications in Genetics and Molecular Biology (Impact Factor: 1.52). 02/2006; 5:Article7. DOI: 10.2202/1544-6115.1189
Source: PubMed

ABSTRACT Modern methods of microarray data analysis are biased towards selecting those genes that display the most pronounced differential expression. The magnitude of differential expression does not necessarily indicate biological significance and other criteria are needed to supplement the information on differential expression. Three large sets of microarray data on childhood leukemia were analyzed by an original method introduced in this paper. A new type of stochastic dependence between expression levels in gene pairs was deciphered by our analysis. This modulation-like unidirectional dependence between expression signals arises when the expression of a "gene-modulator'' is stochastically proportional to that of a "gene-driver''. A total of more than 35% of all pairs formed from 12550 genes were conservatively estimated to belong to this type. There are genes that tend to form Type A relationships with the overwhelming majority of genes. However, this picture is not static: the composition of Type A gene pairs may undergo dramatic changes when comparing two phenotypes. The ability to identify genes that act as ;;modulators'' provides a potential strategy of prioritizing candidate genes.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the goals of microarray experiment is to find differentially expressed genes in two or more stages. The problem is that gene expressions are highly correlated. Here we consider ratios of gene expression levels created from unordered or from ordered pairs of genes. For HYPERDIP and TEL data (different stages of childhood leukemia) it appears that the ratios for different pairs are approximately independent. For each situation we estimate p-values for testing which genes or their proportions are differentially expressed. We display these p-values in histograms and compare their shapes. Our comparing study of histograms shows that shapes of histograms for p-values computed from gene expressions and from their proportions are essentially different.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new method for preliminary identification of gene regulatory networks (GRNs) from gene microarray cancer databased on ridge partial least squares (RPLS) with recursive feature elimination (RFE) and novel Brier and occurrence probability measures. It facilitates the preliminary identification of meaningful pathways and genes for a specific disease, rather than focusing on selecting a small set of genes for classification purposes as in conventional studies. First, RFE and a novel Brier error measure are incorporated in RPLS to reduce the estimation variance using a two-nested cross validation (CV) approach. Second, novel Brier and occurrence probability-based measures are employed in ranking genes across different CV subsamples. It helps to detect different GRNs from correlated genes which consistently appear in the ranking lists. Therefore, unlike most conventional approaches that emphasize the best classification using a small gene set, the proposed approach is able to simultaneously offer good classification accuracy and identify a more comprehensive set of genes and their associated GRNs. Experimental results on the analysis of three publicly available cancer data sets, namely leukemia, colon, and prostate, show that very stable gene sets from different but relevant GRNs can be identified, and most of them are found to be of biological significance according to previous findings in biological experiments. These suggest that the proposed approach may serve as a useful tool for preliminary identification of genes and their associated GRNs of a particular disease for further biological studies using microarray or similar data.
    IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 11/2012; 42(6):1514-1528. DOI:10.1109/TSMCA.2012.2199302 · 2.18 Impact Factor
  • Source

Full-text (2 Sources)

Available from
Aug 2, 2014