Assessing the conservation of mammalian gene expression using high-density exon arrays
ABSTRACT Microarray data from multiple species have been used to study evolutionary constraints on gene expression. Expression measurements from conventional microarray platforms such as the 3' expression arrays are strongly affected by platform-dependent probe effects that may introduce apparent but misleading discrepancies between species. In this manuscript, we assess the conservation of mammalian gene expression in adult tissues using data from a high-density exon array platform. The exon arrays have more than 6 million probes on a single array targeting all exons in a genome. We find that, unlike 3' array data, gene expression measurements from exon arrays reveal patterns of gene expression that are highly conserved between humans and mice in multiple tissues. Our analysis provides strong evidence for widespread stabilizing selection pressure on transcript abundance during mammalian evolution.
- SourceAvailable from: Sven Bergmann
[Show abstract] [Hide abstract]
- "The two most common measures of similarity between expression profiles of orthologous genes are Pearson's correlation coefficient (Chan et al., 2009; Liao and Zhang, 2006a, b; Xing et al., 2007; Yanai et al., 2004; Yang et al., 2005; Zheng-Bradley et al., 2010) and Euclidean distance (Jordan et al., 2005; Liao and Zhang, 2006a; Yanai et al., 2004). The results obtained with Pearson's and Euclidean distances have been reported to be poorly correlated (Liao and Zhang, 2006a; Pereira et al., 2009). "
ABSTRACT: Motivation: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements. Results: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species. Contact: email@example.com; firstname.lastname@example.org Supplementary information: Supplementary data are available at Bioinformatics online.Bioinformatics 05/2012; 28(14):1865-72. DOI:10.1093/bioinformatics/bts266 · 4.62 Impact Factor
[Show abstract] [Hide abstract]
- "bp, Figure S3 and Figure S4) are expressed in significantly fewer tissues than SCGI genes are, but in a significantly greater number of tissues than NCGI genes are (Figure 2A, Figure S3, and Figure S4). We observe the same trends using data from exon microarrays (Xing et al. 2007). We use the metric ''tissue specificity index'' for comparing gene expression breadths from exon microarrays (i.e., Liao et al. 2006). "
ABSTRACT: CpG islands mark CpG-enriched regions in otherwise CpG-depleted vertebrate genomes. While the regulatory importance of CpG islands is widely accepted, it is little appreciated that CpG islands vary greatly in lengths. For example, CpG islands in the human genome vary ∼30-fold in their lengths. Here we report findings suggesting that the lengths of CpG islands have functional consequences. Specifically, we show that promoters associated with long CpG islands (long-CGI promoters) are distinct from other promoters. First, long-CGI promoters are uniquely associated with genes with an intermediate level of gene expression breadths. Notably, intermediate expression breadths require the most complex mode of gene regulation, from the standpoint of information content. Second, long-CGI promoters encode more RNA polymerase II (Polr2a) binding sites than other promoters. Third, the actual binding patterns of Polr2a occur in a more tissue-specific manner in long-CGI promoters compared to other CGI promoters. Moreover, long-CGI promoters contain the largest numbers of experimentally characterized transcription start sites compared to other promoters, and the types of transcription start sites in them are biased toward tissue-specific patterns of gene expression. Finally, long-CGI promoters are preferentially associated with genes involved in development and regulation. Together, these findings indicate that functionally relevant variations of CpG islands exist. By investigating consequences of certain CpG island traits, we can gain additional insights into the mechanism and evolution of regulatory complexity of gene expression.Genetics 02/2011; 187(4):1077-83. DOI:10.1534/genetics.110.126094 · 4.87 Impact Factor
[Show abstract] [Hide abstract]
- "The gene expression levels were determined with reference to the data set downloaded from http://biogibbs.stanford.edu/;yxing/MBE/. This data set was generated by examining the transcriptomes of six human tissues (heart, kidney, liver, muscle, spleen, and testis) using a high-density exon array platform (Xing et al. 2007). The expression level of a gene was defined as the average signal intensity across these six examined tissues. "
ABSTRACT: The evolution of duplicate genes has been a topic of broad interest. Here, we propose that the conservation of gene family size is a good indicator of the rate of sequence evolution and some other biological properties. By comparing the human-chimpanzee-macaque orthologous gene families with and without family size conservation, we demonstrate that genes with family size conservation evolve more slowly than those without family size conservation. Our results further demonstrate that both family expansion and contraction events may accelerate gene evolution, resulting in elevated evolutionary rates in the genes without family size conservation. In addition, we show that the duplicate genes with family size conservation evolve significantly more slowly than those without family size conservation. Interestingly, the median evolutionary rate of singletons falls in between those of the above two types of duplicate gene families. Our results thus suggest that the controversy on whether duplicate genes evolve more slowly than singletons can be resolved when family size conservation is taken into consideration. Furthermore, we also observe that duplicate genes with family size conservation have the highest level of gene expression/expression breadth, the highest proportion of essential genes, and the lowest gene compactness, followed by singletons and then by duplicate genes without family size conservation. Such a trend accords well with our observations of evolutionary rates. Our results thus point to the importance of family size conservation in the evolution of duplicate genes.Molecular Biology and Evolution 03/2010; 27(8):1750-8. DOI:10.1093/molbev/msq055 · 14.31 Impact Factor