Article

Defining diversity, specialization, and gene specificity in transcriptomes through information theory.

Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Cinvestav, Campus Guanajuato, Apartado Postal 629, C.P. 36500 Irapuato, Guanajuato, Mexico.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 08/2008; 105(28):9709-14. DOI: 10.1073/pnas.0803479105
Source: PubMed

ABSTRACT The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.

Download full-text

Full-text

Available from: Octavio Martínez, Jul 01, 2015
1 Follower
 · 
122 Views
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the postgenome era many efforts have been dedicated to systematically elucidate the complex web of interacting genes and proteins. These efforts include experimental and computational methods. Microarray technology offers an opportunity for monitoring gene expression level at the genome scale. By recourse to information theory, this study proposes a mathematical approach to reconstruct gene regulatory networks at a coarse-grain level from high throughput gene expression data. The method provides the a posteriori probability that a given gene regulates positively, negatively or does not regulate each one of the network genes. This approach also allows the introduction of prior knowledge and the quantification of the information gain from experimental data used in the inference procedure. This information gain can be used to choose those genes that will be perturbed in subsequent experiments in order to refine our knowledge about the architecture of an underlying gene regulatory network. The performance of the proposed approach has been studied by in numero experiments. Our results suggest that the approach is suitable for focusing on size-limited problems, such as recovering a small subnetwork of interest by performing perturbation over selected genes.
    06/2011; 390(11-11):2198-2207. DOI:10.1016/j.physa.2011.02.021
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the postgenome era many efforts have been dedicated to systematically elucidate the complex web of interacting genes and proteins. These efforts include experimental and computational methods. Microarray technology offers an opportunity for monitoring gene expression level at the genome scale. By recourse to information theory, this study proposes a mathematical approach to reconstruct gene regulatory networks at coarse-grain level from high throughput gene expression data. The method provides the {\it a posteriori} probability that a given gene regulates positively, negatively or does not regulate each one of the network genes. This approach also allows the introduction of prior knowledge and the quantification of the information gain from experimental data used in the inference procedure. This information gain can be used to chose genes to be perturbed in subsequent experiments in order to refine the knowledge about the architecture of an underlying gene regulatory network. The performance of the proposed approach has been studied by {\it in numero} experiments. Our results suggest that the approach is suitable for focusing on size-limited problems, such as, recovering a small subnetwork of interest by performing perturbation over selected genes. Comment: 17 pages, 4 figus