Article

Bovine ncRNAs Are Abundant, Primarily Intergenic, Conserved and Associated with Regulatory Genes

School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, South Australia, Australia.
PLoS ONE (Impact Factor: 3.53). 08/2012; 7(8). DOI: 10.1371/journal.pone.0042638
Source: PubMed

ABSTRACT It is apparent that non-coding transcripts are a common feature of higher organisms and encode uncharacterized layers of genetic regulation and information. We used public bovine EST data from many developmental stages and tissues, and developed a pipeline for the genome wide identification and annotation of non-coding RNAs (ncRNAs). We have predicted 23,060 bovine ncRNAs, 99% of which are un-annotated, based on known ncRNA databases. Intergenic transcripts accounted for the majority (57%) of the predicted ncRNAs and the occurrence of ncRNAs and genes were only moderately correlated (r = 0.55, p-value,2.2e-16). Many of these intergenic non-coding RNAs mapped close to the 39 or 59 end of thousands of genes and many of these were transcribed from the opposite strand with respect to the closest gene, particularly regulatory-related genes. Conservation analyses showed that these ncRNAs were evolutionarily conserved, and many intergenic ncRNAs proximate to genes contained sequence-specific motifs. Correlation analysis of expression between these intergenic ncRNAs and protein-coding genes using RNA-seq data from a variety of tissues showed significant correlations with many transcripts. These results support the hypothesis that ncRNAs are common, transcribed in a regulated fashion and have regulatory functions. Copyright: ß 2012 Qu, Adelson. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Download full-text

Full-text

Available from: David L Adelson, Jul 06, 2015
0 Followers
 · 
191 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables (Reshef et al., 2011). MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called {\em equitability}, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show that equitability is a generalization of power against statistical independence. Second, it allows us to compute and discuss the population value of MIC, which we call MIC_*. In doing so we generalize and strengthen the mathematical results proven in Reshef et al. (2011) and clarify the relationship between MIC and mutual information. Introducing MIC_* also enables us to reason about the properties of MIC more abstractly: for instance, we show that MIC_* is continuous and that there is a sense in which it is a canonical "smoothing" of mutual information. We also prove an alternate, equivalent characterization of MIC_* that we use to state new estimators of it as well as an algorithm for explicitly computing it when the joint probability density function of a pair of random variables is known. Our hope is that this paper provides a richer theoretical foundation for MIC and equitability going forward. This paper will be accompanied by a forthcoming companion paper that performs extensive empirical analysis and comparison to other methods and discusses the practical aspects of both equitability and the use of MIC and its related statistics.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection, enhanced by pedigree relationships and continually-improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype, although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. While the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be employed to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a one-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits, and facilitate functional genomic selection for economically important traits.
    Journal of Animal Science 10/2012; 91(2). DOI:10.2527/jas.2012-5784 · 1.92 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ncRNAs (non-coding RNAs), in particular long ncRNAs, represent a significant proportion of the vertebrate transcriptome and probably regulate many biological processes. We used publically available ESTs (Expressed Sequence Tags) from human, mouse and zebrafish and a previously published analysis pipeline to annotate and analyze the vertebrate non-protein-coding transcriptome. Comparative analysis confirmed some previously described features of intergenic ncRNAs, such as a positionally biased distribution with respect to regulatory or development related protein-coding genes, and weak but clear sequence conservation across species. Significantly, comparative analysis of developmental and regulatory genes proximate to long ncRNAs indicated that the only conserved relationship of these genes to neighbor long ncRNAs was with respect to genes expressed in human brain, suggesting a conserved, ncRNA cis-regulatory network in vertebrate nervous system development. Most of the relationships between long ncRNAs and proximate coding genes were not conserved, providing evidence for the rapid evolution of species-specific gene associated long ncRNAs. We have reconstructed and annotated over 130,000 long ncRNAs in these three species, providing a significantly expanded number of candidates for functional testing by the research community.
    PLoS ONE 12/2012; 7(12):e52275. DOI:10.1371/journal.pone.0052275 · 3.53 Impact Factor