Defining diversity, specialization, and gene specificity in transcriptomes through information theory

Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Cinvestav, Campus Guanajuato, Apartado Postal 629, C.P. 36500 Irapuato, Guanajuato, Mexico.
Proceedings of the National Academy of Sciences (Impact Factor: 9.67). 08/2008; 105(28):9709-14. DOI: 10.1073/pnas.0803479105
Source: PubMed


The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.

Download full-text


Available from: Octavio Martínez
  • Source
    • "RE insertion close to a lincRNA promoter or RE hypermethylation might interrupt the transcription factors or other regulatory elements binding to lincRNA promoters, which could contribute to lincRNAs tissue-specific expression. We quantified the tissue specificity of lincRNA expression in 16 normal tissues (SRA, E-MTAB-513) and six cell lines (GEO, GSE23316) using an information theory method (Supplementary file) (44). CM lincRNAs had significantly higher tissue-specific expression than RM lincRNAs, which was consistent with our hypothesis (Figure 3B). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite growing consensus that long intergenic non-coding ribonucleic acids (lincRNAs) are modulators of cancer, the knowledge about the deoxyribonucleic acid (DNA) methylation patterns of lincRNAs in cancers remains limited. In this study, we constructed DNA methylation profiles for 4629 tumors and 705 normal tissue samples from 20 different types of human cancer by reannotating data of DNA methylation arrays. We found that lincRNAs had different promoter methylation patterns in cancers. We classified 2461 lincRNAs into two categories and three subcategories, according to their promoter methylation patterns in tumors. LincRNAs with resistant methylation patterns in tumors had conserved transcriptional regulation regions and were ubiquitously expressed across normal tissues. By integrating cancer subtype data and patient clinical information, we identified lincRNAs with promoter methylation patterns that were associated with cancer status, subtype or prognosis for several cancers. Network analysis of aberrantly methylated lincRNAs in cancers showed that lincRNAs with aberrant methylation patterns might be involved in cancer development and progression. The methylated and demethylated lincRNAs identified in this study provide novel insights for developing cancer biomarkers and potential therapeutic targets.
    Full-text · Article · Jul 2014 · Nucleic Acids Research
  • Source
    • "Gene specificity. An additional parameter to measure global properties of the transcriptome , apart from diversity and specialization, is the specificity of the genes[16]. Gene specificity, S, is a coefficient which yields a value of 0 for genes which are equally expressed during each of the time points sampled and reaches a maximum value of log 2 (k) where k, "

    Full-text · Dataset · Mar 2014
  • Source
    • "We used a method based on information theory to directly quantify the degree of tissue-specificity in a given gene's expression pattern across nine normal tissues that were profiled by high-throughput mRNA sequencing (RNA-seq, Sequence Read Archive, SRA:SRA008403) [33-35], with a higher score equating to a more tissue-specific pattern of expression. Hypermethylation prone genes were significantly more tissue-specific than hypermethylation resistant genes (Figure 2b). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Aberrant CpG island promoter DNA hypermethylation is frequently observed in cancer and is believed to contribute to tumor progression by silencing the expression of tumor suppressor genes. Previously, we observed that promoter hypermethylation in breast cancer reflects cell lineage rather than tumor progression and occurs at genes that are already repressed in a lineage-specific manner. To investigate the generality of our observation we analyzed the methylation profiles of 1,154 cancers from 7 different tissue types. Results We find that 1,009 genes are prone to hypermethylation in these 7 types of cancer. Nearly half of these genes varied in their susceptibility to hypermethylation between different cancer types. We show that the expression status of hypermethylation prone genes in the originator tissue determines their propensity to become hypermethylated in cancer; specifically, genes that are normally repressed in a tissue are prone to hypermethylation in cancers derived from that tissue. We also show that the promoter regions of hypermethylation-prone genes are depleted of repetitive elements and that DNA sequence around the same promoters is evolutionarily conserved. We propose that these two characteristics reflect tissue-specific gene promoter architecture regulating the expression of these hypermethylation prone genes in normal tissues. Conclusions As aberrantly hypermethylated genes are already repressed in pre-cancerous tissue, we suggest that their hypermethylation does not directly contribute to cancer development via silencing. Instead aberrant hypermethylation reflects developmental history and the perturbation of epigenetic mechanisms maintaining these repressed promoters in a hypomethylated state in normal cells.
    Full-text · Article · Oct 2012 · Genome biology
Show more