Defining diversity, specialization, and gene specificity in transcriptomes through information theory

Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Cinvestav, Campus Guanajuato, Apartado Postal 629, C.P. 36500 Irapuato, Guanajuato, Mexico.
Proceedings of the National Academy of Sciences (Impact Factor: 9.67). 08/2008; 105(28):9709-14. DOI: 10.1073/pnas.0803479105
Source: PubMed


The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.

Download full-text


Available from: Octavio Martínez,
  • Source
    • "RE insertion close to a lincRNA promoter or RE hypermethylation might interrupt the transcription factors or other regulatory elements binding to lincRNA promoters, which could contribute to lincRNAs tissue-specific expression. We quantified the tissue specificity of lincRNA expression in 16 normal tissues (SRA, E-MTAB-513) and six cell lines (GEO, GSE23316) using an information theory method (Supplementary file) (44). CM lincRNAs had significantly higher tissue-specific expression than RM lincRNAs, which was consistent with our hypothesis (Figure 3B). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite growing consensus that long intergenic non-coding ribonucleic acids (lincRNAs) are modulators of cancer, the knowledge about the deoxyribonucleic acid (DNA) methylation patterns of lincRNAs in cancers remains limited. In this study, we constructed DNA methylation profiles for 4629 tumors and 705 normal tissue samples from 20 different types of human cancer by reannotating data of DNA methylation arrays. We found that lincRNAs had different promoter methylation patterns in cancers. We classified 2461 lincRNAs into two categories and three subcategories, according to their promoter methylation patterns in tumors. LincRNAs with resistant methylation patterns in tumors had conserved transcriptional regulation regions and were ubiquitously expressed across normal tissues. By integrating cancer subtype data and patient clinical information, we identified lincRNAs with promoter methylation patterns that were associated with cancer status, subtype or prognosis for several cancers. Network analysis of aberrantly methylated lincRNAs in cancers showed that lincRNAs with aberrant methylation patterns might be involved in cancer development and progression. The methylated and demethylated lincRNAs identified in this study provide novel insights for developing cancer biomarkers and potential therapeutic targets.
    Nucleic Acids Research 07/2014; 42(13). DOI:10.1093/nar/gku575 · 9.11 Impact Factor
  • Source
    • "Gene specificity. An additional parameter to measure global properties of the transcriptome , apart from diversity and specialization, is the specificity of the genes[16]. Gene specificity, S, is a coefficient which yields a value of 0 for genes which are equally expressed during each of the time points sampled and reaches a maximum value of log 2 (k) where k, "

  • Source
    • "Box plots were generated comparing classes of genes (including all the genes on the deluxe promoter array to all 5hmC-marked TSS region genes). The specificity of a gene's expression pattern was measured by using a method based on information theory outlined by Martinez et al. [72]. A low score (0) indicates that a gene is uniformly expressed, and a high score (6.2) indicates that it is expressed specifically in one tissue. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Induction and promotion of liver cancer by exposure to non-genotoxic carcinogens coincides with epigenetic perturbations, including specific changes in DNA methylation. Here we investigate the genome-wide dynamics of 5-hydroxymethylcytosine (5hmC) as a likely intermediate of 5-methylcytosine (5mC) demethylation in a DNA methylation reprogramming pathway. We use a rodent model of non-genotoxic carcinogen exposure using the drug phenobarbital. Results Exposure to phenobarbital results in dynamic and reciprocal changes to the 5mC/5hmC patterns over the promoter regions of a cohort of genes that are transcriptionally upregulated. This reprogramming of 5mC/5hmC coincides with characteristic changes in the histone marks H3K4me2, H3K27me3 and H3K36me3. Quantitative analysis of phenobarbital-induced genes that are involved in xenobiotic metabolism reveals that both DNA modifications are lost at the transcription start site, while there is a reciprocal relationship between increasing levels of 5hmC and loss of 5mC at regions immediately adjacent to core promoters. Conclusions Collectively, these experiments support the hypothesis that 5hmC is a potential intermediate in a demethylation pathway and reveal precise perturbations of the mouse liver DNA methylome and hydroxymethylome upon exposure to a rodent hepatocarcinogen.
    Genome biology 10/2012; 13(10):R93. DOI:10.1186/gb-2012-13-10-r93 · 10.81 Impact Factor
Show more