-
[show abstract]
[hide abstract]
ABSTRACT: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments. However, AEA isolates and tests for overrepresentation only individual annotation terms or groups of similar terms and is limited in its ability to uncover complex phenomena involving relationship between multiple annotation terms from various knowledge bases. Also, AEA assumes that annotations describe the whole object of interest, which makes it difficult to apply it to sets of compound objects (e.g. sets of protein-protein interactions) and to sets of objects having an internal structure (e.g. protein complexes).
We propose a novel logic-based Annotation Concept Synthesis and Enrichment Analysis (ACSEA) approach. ACSEA fuses inductive logic reasoning with statistical inference to uncover more complex phenomena captured by the experiments. We evaluate our approach on large-scale datasets from several microarray experiments and on a clustered genome-wide genetic interaction network using different biological knowledge bases. The discovered interpretations have lower P-values than the interpretations found by AEA, are highly integrative in nature, and include analysis of quantitative and structured information present in the knowledge bases. The results suggest that ACSEA can boost effectiveness of the processing of high-throughput experiments.
mjiline@site.uottawa.ca.
Bioinformatics 09/2011; 27(17):2391-8. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic
and proteomic experiments such as gene expression microarrays. We briefly review ideas behind AEA, identify some limitations
and propose a novel logic-based Annotation Concept Synthesis and Enrichment Analysis (ACSEA) approach. ACSEA fuses inductive
logic reasoning with statistical inference to uncover more complex phenomena captured by the experiments. The results of the
evaluation suggest that ACSEA can boost effectiveness of the processing of high-throughput experiments.
05/2010: pages 304-308;
-
Advances in Artificial Intelligence, 23rd Canadian Conference on Artificial Intelligence, Canadian, AI 2010, Ottawa, Canada, May 31 - June 2, 2010. Proceedings; 01/2010
-
[show abstract]
[hide abstract]
ABSTRACT: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.
The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.
We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.
BMC Bioinformatics 02/2007; 8:190. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Internal ribosome entry sites (IRES) allow ribosomes to be recruited to mRNA in a cap-independent manner. Some viruses that impair cap-dependent translation initiation utilize IRES to ensure that the viral RNA will efficiently compete for the translation machinery. IRES are also employed for the translation of a subset of cellular messages during conditions that inhibit cap-dependent translation initiation. IRES from viruses like Hepatitis C and Classical Swine Fever virus share a similar structure/function without sharing primary sequence similarity. Of the cellular IRES structures derived so far, none were shown to share an overall structural similarity. Therefore, we undertook a genome-wide search of human 5'UTRs (untranslated regions) with an empirically derived structure of the IRES from the key inhibitor of apoptosis, X-linked inhibitor of apoptosis protein (XIAP), to identify novel IRES that share structure/function similarity. Three of the top matches identified by this search that exhibit IRES activity are the 5'UTRs of Aquaporin 4, ELG1 and NF-kappaB repressing factor (NRF). The structures of AQP4 and ELG1 IRES have limited similarity to the XIAP IRES; however, they share trans-acting factors that bind the XIAP IRES. We therefore propose that cellular IRES are not defined by overall structure, as viral IRES, but are instead dependent upon short motifs and trans-acting factors for their function.
Nucleic Acids Research 02/2007; 35(14):4664-77. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Abstract
Background
In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.
Results
The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.
Conclusion
We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.
BMC Bioinformatics. 01/2007;
-
[show abstract]
[hide abstract]
ABSTRACT: The cell has many ways to regulate the production of proteins. One mechanism is through the changes to the machinery of translation initiation. These alterations favor the translation of one subset of mRNAs over another. It was first shown that internal ribosome entry sites (IRESes) within viral RNA genomes allowed the production of viral proteins more efficiently than most of the host proteins. The RNA secondary structure of viral IRESes has sometimes been conserved between viral species even though the primary sequences differ. These structures are important for IRES function, but no similar structure conservation has yet to be shown in cellular IRES. With the advances in mathematical modeling and computational approaches to complex biological problems, is there a way to predict an IRES in a data set of unknown sequences? This review examines what is known about cellular IRES structures, as well as the data sets and tools available to examine this question. We find that the lengths, number of upstream AUGs, and %GC content of 5'-UTRs of the human transcriptome have a similar distribution to those of published IRES-containing UTRs. Although the UTRs containing IRESes are on the average longer, almost half of all 5'-UTRs are long enough to contain an IRES. Examination of the available RNA structure prediction software and RNA motif searching programs indicates that while these programs are useful tools to fine tune the empirically determined RNA secondary structure, the accuracy of de novo secondary structure prediction of large RNA molecules and subsequent identification of new IRES elements by computational approaches, is still not possible.
RNA 11/2006; 12(10):1755-85. · 5.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Recent experimental evidences have shown that ribonucleic acid (RNA) plays a greater role in the cell than previously thought. An ensemble of RNA sequences believed to contain signals at the structure level can be exploited to detect functional motifs common to all or a portion of those sequences. We present here a general framework for analyzing multiple RNA secondary structures. A family of related RNA structures may be analyzed using statistical regression methods. In this work, we extend our previously developed algorithm, seed, that allows to explore exhaustively the search space of RNA sequence and structure motifs. We introduce here several objective functions based on thermodynamic free energy and information content to discriminate native folds from the rest. We assume that the variation across the various scores can be represented by a statistical model. Regression analysis permits to assign separate weight for each score, allowing one to emphasize or compensate the variance that differs across the different scores. A statistical model can be formulated using techniques from regression analysis to obtain a template or scoring model that is able to identify putative functional regions in RNA sequences. We show that thermodynamic based regression models are effective to associate the variation of scores obtained from different functions. The models can generally identify motifs with high measures of specificity and positive predicted value to known motifs. A good scoring method will allow to eliminate invalid motifs thereby reducing the size of the hypothesis space
Electrical and Computer Engineering, 2006. CCECE '06. Canadian Conference on; 06/2006
-
[show abstract]
[hide abstract]
ABSTRACT: The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process.
We present an algorithm called Seed to identify all the conserved RNA secondary structure motifs in a set of unaligned sequences. The search space is defined as the set of all the secondary structure motifs inducible from a seed sequence. A general-to-specific search allows finding all the motifs that are conserved. Suffix arrays are used to enumerate efficiently all the biological palindromes as well as for the matching of RNA secondary structure expressions. We assessed the ability of this approach to uncover known structures using four datasets. The enumeration of the motifs relies only on the secondary structure definition and conservation only, therefore allowing for the independent evaluation of scoring schemes. Twelve simple objective functions based on free energy were evaluated for their potential to discriminate native folds from the rest.
Our evaluation shows that 1) support and exclusion constraints are sufficient to make an exhaustive search of the secondary structure space feasible. 2) The search space induced from a seed sequence contains known motifs. 3) Simple objective functions, consisting of a combination of the free energy of matching sequences, can generally identify motifs with high positive predictive value and sensitivity to known motifs.
BMC Bioinformatics 02/2006; 7:244. · 2.75 Impact Factor
-
Proceedings of the 2006 International Conference on Bioinformatics & Computational Biology, BIOCOMP'06, Las Vegas, Nevada, USA, June 26-29, 2006; 01/2006
-
[show abstract]
[hide abstract]
ABSTRACT: Comparative RNA sequence analyses have contributed remarkably accurate predictions. The recent determination of the 30S and 50S ribosomal subunits bringing more supporting evidence. Several inference tools are combining free energy minimisation and comparative analysis to improve the quality of secondary structure predictions. This paper investigates the following hypotheses: the use of three input sequences improves the average accuracy compared to predictions based on two input sequences; the worse prediction (minimum accuracy) for any sequence should be more accurate when three input sequences are used rather than two; finally, the consensus structure of three sequences is probably less representative of the individual sequences. The average coverage should be less.
International Journal of Bioinformatics Research and Applications 02/2005; 1(2):230-45.
-
Computational Science - ICCS 2005, 5th International Conference, Atlanta, GA, USA, May 22-25, 2005, Proceedings, Part II; 01/2005
-