Defining diversity, specialization, and gene specificity in transcriptomes through information theory.

Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Cinvestav, Campus Guanajuato, Apartado Postal 629, C.P. 36500 Irapuato, Guanajuato, Mexico.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 08/2008; 105(28):9709-14. DOI: 10.1073/pnas.0803479105
Source: PubMed

ABSTRACT The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite growing consensus that long intergenic non-coding ribonucleic acids (lincRNAs) are modulators of cancer, the knowledge about the deoxyribonucleic acid (DNA) methylation patterns of lincRNAs in cancers remains limited. In this study, we constructed DNA methylation profiles for 4629 tumors and 705 normal tissue samples from 20 different types of human cancer by reannotating data of DNA methylation arrays. We found that lincRNAs had different promoter methylation patterns in cancers. We classified 2461 lincRNAs into two categories and three subcategories, according to their promoter methylation patterns in tumors. LincRNAs with resistant methylation patterns in tumors had conserved transcriptional regulation regions and were ubiquitously expressed across normal tissues. By integrating cancer subtype data and patient clinical information, we identified lincRNAs with promoter methylation patterns that were associated with cancer status, subtype or prognosis for several cancers. Network analysis of aberrantly methylated lincRNAs in cancers showed that lincRNAs with aberrant methylation patterns might be involved in cancer development and progression. The methylated and demethylated lincRNAs identified in this study provide novel insights for developing cancer biomarkers and potential therapeutic targets.
    Nucleic Acids Research 07/2014; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Parasite biology, by its very nature, cannot be understood without integrating it with that of the host, nor can the host response be adequately explained without considering the activity of the parasite. However, due to experimental limitations, molecular studies of parasite-host systems have been predominantly one-sided investigations focusing on either of the partners involved. Here, we conducted a dual RNA-seq time course analysis of filarial worm parasite and host mosquito to better understand the parasite processes underlying development in and interaction with the host tissue, from the establishment of infection to the development of infective-stage larva.
    PLoS Neglected Tropical Diseases 05/2014; 8(5):e2905. · 4.49 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A gene expression atlas is an essential resource to quantify and understand the multiscale processes of embryogenesis in time and space. The automated reconstruction of a prototypic 4D atlas for vertebrate early embryos, using multicolor fluorescence in situ hybridization with nuclear counterstain, requires dedicated computational strategies. To this goal, we designed an original methodological framework implemented in a software tool called Match-IT. With only minimal human supervision, our system is able to gather gene expression patterns observed in different analyzed embryos with phenotypic variability and map them onto a series of common 3D templates over time, creating a 4D atlas. This framework was used to construct an atlas composed of 6 gene expression templates from a cohort of zebrafish early embryos spanning 6 developmental stages from 4 to 6.3 hpf (hours post fertilization). They included 53 specimens, 181,415 detected cell nuclei and the segmentation of 98 gene expression patterns observed in 3D for 9 different genes. In addition, an interactive visualization software, Atlas-IT, was developed to inspect, supervise and analyze the atlas. Match-IT and Atlas-IT, including user manuals, representative datasets and video tutorials, are publicly and freely available online. We also propose computational methods and tools for the quantitative assessment of the gene expression templates at the cellular scale, with the identification, visualization and analysis of coexpression patterns, synexpression groups and their dynamics through developmental stages.
    PLoS Computational Biology 06/2014; 10(6):e1003670. · 4.83 Impact Factor

Full-text (2 Sources)

Available from
Jun 19, 2014
Available from