Prediction of microRNA-regulated protein interaction pathways in Arabidopsis using machine learning algorithms

ArticleinComputers in Biology and Medicine 43(11):1645-52 · November 2013with 110 Reads 
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
Cite this publication

Do you want to read the rest of this article?

Request Full-text Paper PDF
Advertisement
  • ... The present authors adopted amino acid composition profile information with the SVM classifier to improve protein complexes classification [29]. Additionally, we also proposed identifying microRNAs target of Arabidopsis thaliana by integrating prediction scores from PITA, miRanda, and RNAHybrid algorithms [30]. Recently, Li et al. [31] used random forest machine learning algorithm and topology features to identify the functions of protein complexes. ...
    ... We assumed the " -" type PPI as a positive set and the rest as a negative set. According to our previous work [30], the balanced trained dataset usually has better performance than the unbalanced one; hence, the algorithms are trained with an equal size ratio of 1 : 1 for the positive and negative dataset. Since the sizes of the original positive and negative sets differ by a factor of about 6 (unbalanced learning set), to generate a balanced learning set, the 15214 positive target interactions (cancer PPIs) were kept, and a total of 15214 noncancer PPIs were randomly selected from the negative set. ...
    Article
    Full-text available
    Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.
  • ... However, accurate identification of miR targets is an essential step for a candidate miR to avoid unwanted suppression of off-targets. Toward this end, an integrative machine learning method was used to predict miR target genes, which include disease resistance genes and TF, to get novel insights into plant-pathogen interaction networks (Kurubanjerdjit et al. 2013). ...
    Article
    Full-text available
    Systems biology is an inclusive approach to study the static and dynamic emergent properties on a global scale by integrating multiomics datasets to establish qualitative and quantitative associations among multiple biological components. With an abundance of improved high throughput -omics datasets, network-based analyses and machine learning technologies are playing a pivotal role in comprehensive understanding of biological systems. Network topological features reveal most important nodes within a network as well as prioritize significant molecular components for diverse biological networks, including coexpression, protein-protein interaction, and gene regulatory networks. Machine learning techniques provide enormous predictive power through specific feature extraction from biological data. Deep learning, a subtype of machine learning, has plausible future applications because a domain expert for feature extraction is not needed in this algorithm. Inspired by diverse domains of biology, we here review classic systems biology techniques applied in plant immunity thus far. We also discuss additional advanced approaches in both graph theory and machine learning, which may provide new insights for understanding plant-microbe interactions. Finally, we propose a hybrid approach in plant immune systems that harnesses the power of both network biology and machine learning, with a potential to be applicable to both model systems and agronomically important crop plants.
  • ... The machine learning algorithms was implemented in the Weka software tool, and a ten-fold cross-validation test was used to train the supervised model. Based on our previous studies [30,31], a balanced data set typically provides better performance than an unbalanced one, so, the machine learning algorithms were trained using positive and negative datasets that contained equal numbers of data. Experimental results revealed that the proposed machine learning method identified cancer proteins with relatively high hit ratios (about 80 %). ...
    Article
    Full-text available
    Non-small cell lung cancer (NSCLC) is one of the leading causes of death globally, and research into NSCLC has been accumulating steadily over several years. Drug repositioning is the current trend in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development process of drugs, as well as reducing side effects. This work integrates two approaches - machine learning algorithms and topological parameter-based classification - to develop a novel pipeline of drug repositioning to analyze four lung cancer microarray datasets, enriched biological processes, potential therapeutic drugs and targeted genes for NSCLC treatments. A total of 7 (8) and 11 (12) promising drugs (targeted genes) were discovered for treating early- and late-stage NSCLC, respectively. The effectiveness of these drugs is supported by the literature, experimentally determined in-vitro IC 50 and clinical trials. This work provides better drug prediction accuracy than competitive research according to IC 50 measurements. With the novel pipeline of drug repositioning, the discovery of enriched pathways and potential drugs related to NSCLC can provide insight into the key regulators of tumorigenesis and the treatment of NSCLC. Based on the verified effectiveness of the targeted drugs predicted by this pipeline, we suggest that our drug-finding pipeline is effective for repositioning drugs.
  • ... MiRNA and PPI play an important role in the infection process, some critical inter-species interactions such as HPI and pathogenicity occur through PPI [33]. GO annotation of miRNA-regulated PPI with PRG and TF information were implemented. ...
    Article
    Full-text available
    In this study, the microarray data for Arabidopsis thaliana infected with Xanthomonas campestris pv. campestris (Xcc) is analyzed, where differentially expressed genes (DEGs) are identified, and Gene Set Enrichment Analysis (GSEA) are employed for analysis. As a result, highly relevant pathogen resistant pathways are inferred. Furthermore, the results of DEGs for various different conditions; such as, infection by different strains of Agrobacterium tumefaciens, are determined.
  • ... MiRNA and PPI play an important role in the infection process, some critical inter-species interactions such as HPI and pathogenicity occur through PPI [45]. GO annotation of miRNA-regulated PPI with PRG and TF information were implemented. ...
    Article
    Full-text available
    In this study, the microarray data for Arabidopsis thaliana infected with Xanthomonas campestris pv. campestris (Xcc) is analyzed, where differentially expressed genes (DEGs) are identified, and Gene Set Enrichment Analysis (GSEA) are employed for functional study. As a result, highly relevant pathogen resistant pathways are inferred. The analyses suggested that certain proteins, i.e., SGT1, HSP and SEC, and secondary metabolites are actively involved in plant defense mechanism. Furthermore, the results of DEGs for various different conditions; such as, infection by different strains of Agrobacterium tumefaciens, are determined. Furthermore, protein-protein interaction (PPI) plays an important role in host-pathogen interaction (HPI). Gene Ontology (GO) annotation for microRNA-regulated PPI, pathogen resistant genes and transcription factors information are implemented, such resources can provide new insights for microRNA-regulated PPI networks in HPI study. In addition, microRNA-regulated PPI networks are clustered by using the k-means algorithm, and the results are further analyzed by GSEA. It was found that the hormone mediated signaling pathway plays an essential role in HPI.
  • ... The 3'-UTR sequences of oyster genes were obtained based on the genome annotation information and subjected to miRanda [59] for target prediction of those neurotransmitter-responsive miRNAs. Genes were regarded as putative targets if they satisfied the criteria with free energy less than −25 kcal/mol and score value higher than 160. ...
    Article
    Full-text available
    Background: Neural-endocrine-immune (NEI) system is a major modulation network among the nervous, endocrine and immune system and weights greatly in maintaining homeostasis of organisms during stress and infection. Some microRNAs are found interacting with NEI system (designated NeurimmiRs), addressing swift modulations on immune system. The oyster Crassostrea gigas, as an intertidal bivalve, has evolved a primary NEI system. However, the knowledge about NeurimmiRs in oysters remains largely unknown. Results: Six small RNA libraries from haemocytes of oysters stimulated with acetylcholine (ACh) and norepinephrine (NE) were sequenced to identify neurotransmitter-responsive miRNAs and survey their immunomodulation roles. A total of 331 miRNAs (132 identified in the present study plus 199 identified previously) were subjected to expression analysis, and twenty-one and sixteen of them were found ACh- or NE-responsive, respectively (FDR < 0.05). Meanwhile, 21 miRNAs exhibited different expression pattern after ACh or NE stimulation. Consequently, 355 genes were predicted as putative targets of these neurotransmitter-responsive miRNAs in oyster. Through gene onthology analysis, multiple genes involved in death, immune system process and response to stimulus were annotated to be modulated by NeurimmiRs. Besides, a significant decrease in haemocyte phagocytosis and late-apoptosis or necrosis rate was observed after ACh and NE stimulation (p < 0.05) while early-apoptosis rate remained unchanged. Conclusions: A comprehensive immune-related network involving PRRs, intracellular receptors, signaling transducers and immune effectors was proposed to be modulated by ACh- and NE-responsive NeurimmiRs, which would be indispensable for oyster haemocytes to respond against stress and infection. Characterization of the NeurimmiRs would be an essential step to understand the NEI system of invertebrate and the adaptation mechanism of oyster.
  • ... Machine learning algorithms are extensively used in computational biology to give reliable miRNA target predictions [18,19]. The general approach is that we train one or multiple classifiers using both positive and negative training data and then predict new examples in a test set. ...
    Article
    Full-text available
    MicroRNAs are known to play important roles in the transcriptional and post-transcriptional regulation of gene expression. While intensive research has been conducted to identify miRNAs and their target genes in various genomes, there is only limited knowledge about how microRNAs are regulated. In this study, we construct a pipeline that can infer the regulatory relationships between transcription factors and microRNAs from ChIP-Seq data with high confidence. In particular, after identifying candidate peaks from ChIP-Seq data, we formulate the inference as a PU learning (learning from only positive and unlabeled examples) problem. Multiple features including the statistical significance of the peaks, the location of the peaks, the transcription factor binding site motifs, and the evolutionary conservation are derived from peaks for training and prediction. To further improve the accuracy of our inference, we also apply a mean reciprocal rank (MRR)-based method to the candidate peaks. We apply our pipeline to infer TF-miRNA regulatory relationships in mouse embryonic stem cells. The experimental results show that our approach provides very specific findings of TF-miRNA regulatory relationships.
  • ... MiRNA and PPI play an important role in the infection process, some critical inter-species interactions such as HPI and pathogenicity occur through PPI [45]. GO annotation of miRNA-regulated PPI with PRG and TF information were implemented. ...
    Article
    Full-text available
    In this study, the microarray data for Arabidopsis thaliana infected with Xanthomonas campestris pv. campestris (Xcc) is analyzed, where differentially expressed genes (DEGs) are identified, and Gene Set Enrichment Analysis (GSEA) are employed for functional study. As a result, highly relevant pathogen resistant pathways are inferred. The analyses suggested that certain proteins, i.e., SGT1, HSP and SEC, and secondary metabolites are actively involved in plant defense mechanism. Furthermore, the results of DEGs for various different conditions; such as, infection by different strains of Agrobacterium tumefaciens, are determined. Furthermore, protein-protein interaction (PPI) plays an important role in host-pathogen interaction (HPI). Gene Ontology (GO) annotation for microRNA-regulated PPI, pathogen resistant genes and transcription factors information are implemented, such resources can provide new insights for microRNA-regulated PPI networks in HPI study. In addition, microRNA-regulated PPI networks are clustered by using the k-means algorithm, and the results are further analyzed by GSEA. It was found that the hormone mediated signaling pathway plays an essential role in HPI.
  • ... We designed a machine learning algorithm, a rapid miner workflow (RMA-WF) [43][44] [85][86][87] based on support vector machines (SVM). ...
    Article
    Full-text available
    Defining the aggressiveness and growth rate of a malignant cell population is a key step in the clinical approach to treating tumor disease. The correct grading of breast cancer (BC) is a fundamental part in determining the appropriate treatment. Biological variables can make it difficult to elucidate the mechanisms underlying BC development. To identify potential markers that can be used for BC classification, we analyzed mRNAs expression profiles, gene copy numbers, microRNAs expression and their association with tumor grade in BC microarray-derived datasets. From mRNA expression results, we found that grade 2 BC is most likely a mixture of grade 1 and grade 3 that have been misclassified, being described by the gene signature of either grade 1 or grade 3. We assessed the potential of the new approach of integrating mRNA expression profile, copy number alterations, and microRNA expression levels to select a limited number of genomic BC biomarkers. The combination of mRNA profile analysis and copy number data with microRNA expression levels led to the identification of two gene signatures of 42 and 4 altered genes (FOXM1, KPNA4, H2AFV and DDX19A) respectively, the latter obtained through a meta-analytical procedure. The 42-based gene signature identifies 4 classes of up- or down-regulated microRNAs (17 microRNAs) and of their 17 target mRNA, and the 4-based genes signature identified 4 microRNAs (Hsa-miR-320d, Hsa-miR-139-5p, Hsa-miR-567 and Hsa-let-7c). These results are discussed from a biological point of view with respect to pathological features of BC. Our identified mRNAs and microRNAs were validated as prognostic factors of BC disease progression, and could potentially facilitate the implementation of assays for laboratory validation, due to their reduced number.
  • Article
    Background Rapid identification of new essential genes is necessary to understand biological mechanisms and identify potential targets for antimicrobial drugs. Many computational methods have been proposed. Objectives To construct an essential genes classifier which satisfies more different organisms, and to study the redundancy of features used in the prediction of essential genes. Methods We designed a 57-12-1 artificial neural network model to predict the essential genes of 31 prokaryotic genomes. Four methods including self-predictions of each organism, the leave-one-genome-out method, predicting all by one organism, and self-predictions of all organisms were applied to assess the predictive performance. Additionally, the 57 features used in the artificial neural network model were analyzed by weighted principal component analysis to screen the key features strongly related to the essentiality of genes. Results Our results compared with previous researches indicate that our models had better generalizability. Furthermore, this method reduced the features to 29 while maintaining stable prediction performance overall, suggesting that some features are redundant for gene essentiality, and the screened features contained more important biological information for gene essentiality. Conclusion This study showed the effectiveness and generalizability of our artificial neural network model. In addition, the screened features could be used as key features in computational analysis and biological experiments.
  • Article
    The unicellular green alga Chlamydomonas reinhardtii harbors many types of small RNAs (sRNAs) but little is known about their role(s) in the regulation of endogenous genes and cellular processes. To define functional microRNAs (miRNAs) in Chlamydomonas, we characterized sRNAs associated with an argonaute protein, AGO3, by affinity purification and deep sequencing. Using a stringent set of criteria for canonical miRNA annotation, we identified 39 precursor miRNAs, which produce 45 unique, AGO3-associated miRNA sequences including 13 previously reported miRNAs and 32 novel ones. Potential miRNA targets were identified based on the complementarity of miRNAs with candidate binding sites on transcripts, and classified, depending on the extent of complementarity, as being likely to be regulated through cleavage or translational repression. The search for cleavage targets identified 74 transcripts. However, only six of them showed an increase in mRNA levels in a mutant strain almost devoid of sRNAs. The search for translational repression targets, which used complementarity criteria more stringent than those empirically required for a reduction in target protein levels, identified 488 transcripts. However, unlike observations in metazoans, most predicted translation repression targets did not show appreciable changes in transcript abundance in the absence of sRNAs. Additionally, of three candidate targets examined at the protein level, only one showed a moderate variation in polypeptide amount in the mutant strain. Our results emphasize the difficulty in identifying genuine miRNA targets in Chlamydomonas and suggest that miRNAs, under standard laboratory conditions, might have mainly a modulatory role in endogenous gene regulation in this alga. Copyright © 2015, The Genetics Society of America.
  • Article
    Defining the aggressiveness and growth rate of a malignant cell population is a key step in the clinical approach to treating tumor disease. The correct grading of breast cancer (BC) is a fundamental part in determining the appropriate treatment. Biological variables can make it difficult to elucidate the mechanisms underlying BC development. To identify potential markers that can be used for BC classification, we analyzed mRNAs expression profiles, gene copy numbers, microRNAs expression and their association with tumor grade in BC microarray-derived datasets. From mRNA expression results, we found that grade 2 BC is most likely a mixture of grade 1 and grade 3 that have been misclassified, being described by the gene signature of either grade 1 or grade 3. We assessed the potential of the new approach of integrating mRNA expression profile, copy number alterations, and microRNA expression levels to select a limited number of genomic BC biomarkers. The combination of mRNA profile analysis and copy number data with microRNA expression levels led to the identification of two gene signatures of 42 and 4 altered genes (FOXM1, KPNA4, H2AFV and DDX19A) respectively, the latter obtained through a meta-analytical procedure. The 42-based gene signature identifies 4 classes of up-or down-regulated microRNAs (17 microRNAs) and of their 17 target mRNA, and the 4-based genes signature identified 4 microRNAs (Hsa-miR-320d, Hsa-miR-139-5p, Hsa-miR-567 and Hsa-let-7c). These results are discussed from a biological point of view with respect to pathological features of BC. Our identified mRNAs and microRNAs were validated as prognostic factors of BC disease progression, and could potentially facilitate the implementation of assays for laboratory validation, due to their reduced number.
  • Prediction of microRNA-regulated A. thaliana-Xcc protein interaction pathways
    • N Kurubanjerdjit
    • J J P Tsai
    • K.-L Ng
    N. Kurubanjerdjit, J.J.P. Tsai & K.-L Ng, Prediction of microRNA-regulated A. thaliana-Xcc protein interaction pathways. International Conference on Agricultural, Environment and Biological Sciences (ICAEBS'2012) May 26–27, 2012 Phuket, Thailand (2012).
    • A J Enright
    • B John
    • U Gaul
    • T Tuschl
    • C Sander
    • D S Marks
    A.J. Enright, B. John, U. Gaul, T. Tuschl, C. Sander, D.S. Marks, MicroRNA targets in Drosophila, Genome Biol. 5 (2003) R1.
    • B John
    • A J Enright
    • A Aravin
    • T Tuschl
    • C Sander
    • D S Marks
    B. John, A.J. Enright, A. Aravin, T. Tuschl, C. Sander, D.S. Marks, Human MicroRNA Targets, PLoS Biol. 2 (11) (2004) e363.
  • His research interests include database design, web programming and data mining
    • Yu-Liang
    Yu-liang Lee received the M.S. degree from Department of Bioinformatics, Asia University, Taiwan in 2009. Since April 2011, he joined the Department of Biomedical Informatics, Asia University as a Research Assistant. His research interests include database design, web programming and data mining.
  • The role of site accessibility in microRNA target recognition
    • M Kertesz
    • N Iovino
    • U Unnerstall
    • U Gaul
    • E Segal
    M. Kertesz, N. Iovino, U. Unnerstall, U. Gaul, E. Segal, The role of site accessibility in microRNA target recognition, Nat. Genet. 39 (10) (2007) 1278-1284.
  • degree in computer science from Tatung Institute of Technology Taiwan in 1991, and the Ph.D. degree in Computer and Information Engineering from National Tsing Hua University Currently, he is an associate professor at the Department of
    • Chien-Hung Huang Received The
    Chien-Hung Huang received the B.S. degree in computer science from Tatung Institute of Technology, Taipei, Taiwan in 1991, and the Ph.D. degree in Computer and Information Engineering from National Tsing Hua University, Hsinchu, Taiwan in 1999. From 1999 to 2004, he joined the faculty of Ling Tung University. Currently, he is an associate professor at the Department of Computer and Information Engineering, National Formosa University. His research interests include bioinformatics, data hiding, algorithms and open source distributions.
  • Comprehensive prediction of novel microRNA targets in Arabidopsis thaliana
    • L Alves-Junior
    • S Niemeier
    • A Hauenschild
    • M Rehmsmeier
    • T Merkle
    L. Alves-Junior, S. Niemeier, A. Hauenschild, M. Rehmsmeier, T. Merkle, Comprehensive prediction of novel microRNA targets in Arabidopsis thaliana, Nucleic Acids Res. 37 (2009) 4010-4021.
  • miTarget: microRNA target gene prediction using a support vector machine
    • S K Kim
    • J W Nam
    • J K Rhree
    • W J Lee
    • B T Zhang
    S.K. Kim, J.W. Nam, J.K. Rhree, W.J. Lee, B.T. Zhang, miTarget: microRNA target gene prediction using a support vector machine, Bioinformatics 7 (1) (2006) 441.
  • Article
    Full-text available
    microRNAs are short RNA fragments that have the capacity of regulating hundreds of target gene expression. Currently, due to lack of high-throughput experimental methods for miRNA target identification, a collection of computational target prediction approaches have been developed. However, these approaches deal with different features or factors are weighted differently resulting in diverse range of predictions. The prediction accuracy remains uncertain. In this paper, three commonly used target prediction algorithms are evaluated and further integrated using algorithm combination, ranking aggregation and Bayesian Network classification. Our results revealed that each individual prediction algorithm displays its advantages as was shown on different test data sets. Among different integration strategies, the application of Bayesian Network classifier on the features calculated from multiple prediction methods significantly improved target prediction accuracy.
  • Article
    Machine learning is already a mature field with significant theoretical work and an impressive suite of applications. I will discuss learning algorithms together with some example applications, as well as the current challenges and research areas. WIREs Comp Stat 2011 3 195–203 DOI: 10.1002/wics.166 For further resources related to this article, please visit the WIREs website.
  • Article
    This paper reports a scalar implementation of a multi-dimensional direct simulation Monte Carlo (DSMC) package named “Generalized Rarefied gAs Simulation Package” (GRASP). This implementation adopts a concept of simulation engine and it utilizes many Object-Oriented Programming features and software engineering design patterns. As a result, this implementation successfully resolves the problem of program functionality and interface conflictions for multi-dimensional DSMC implementations. The package has an open architecture which benefits further development and code maintenance. To reduce engineering time for three-dimensional simulations, one effective implementation is to adopt a hybrid grid scheme with a flexible data structure, which can automatically treat cubic cells adjacent to object surfaces. This package can utilize traditional structured, unstructured or hybrid grids to model multi-dimensional complex geometries and simulate rarefied non-equilibrium gas flows. Benchmark test cases indicate that this implementation has satisfactory accuracy for complex rarefied gas flow simulations.
  • Article
    The software has been designed specially for concurrent application of the statistical tolerance (ST) and the statistical process control (SPC). The software system includes the two function modules for data, i.e. the data acquisition module with data collection and input, and data processing module with data storage and management, statistical analysis and control charts drawing. The system provides a set of functional modules for realizing three tasks: ST and SPC design, drawing control charts and verifying for assuring the designated process quality indices (PQIs) that representing required process quality. Some considerations for the software design are discussed and the mathematical modeling for ST and PQIs are given. The main steps for ST and PQI verification with the controlled process data are also introduced. The software of quality-oriented ST and SPC has been tested for its computing accuracy and precise. It provides an efficiency tool for ST and SPC parameter design and a convenient communication platform for product designers, manufacture engineers and quality engineers.
  • Article
    Full-text available
    The AthaMap database generates a genome-wide map for putative transcription factor binding sites for A. thaliana. When analyzing transcriptional regulation using AthaMap it may be important to learn which genes are also post-transcriptionally regulated by inhibitory RNAs. Therefore, a unified database for transcriptional and post-transcriptional regulation will be highly useful for the analysis of gene expression regulation. To identify putative microRNA target sites in the genome of A. thaliana, processed mature miRNAs from 243 annotated miRNA genes were used for screening with the psRNATarget web server. Positional information, target genes and the psRNATarget score for each target site were annotated to the AthaMap database. Furthermore, putative target sites for small RNAs from seven small RNA transcriptome datasets were used to determine small RNA target sites within the A. thaliana genome. Putative 41,965 genome wide miRNA target sites and 10,442 miRNA target genes were identified in the A. thaliana genome. Taken together with genes targeted by small RNAs from small RNA transcriptome datasets, a total of 16,600 A. thaliana genes are putatively regulated by inhibitory RNAs. A novel web-tool, 'MicroRNA Targets', was integrated into AthaMap which permits the identification of genes predicted to be regulated by selected miRNAs. The predicted target genes are displayed with positional information and the psRNATarget score of the target site. Furthermore, putative target sites of small RNAs from selected tissue datasets can be identified with the new 'Small RNA Targets' web-tool. The integration of predicted miRNA and small RNA target sites with transcription factor binding sites will be useful for AthaMap-assisted gene expression analysis. URL: http://www.athamap.de/
  • Article
    Full-text available
    MicroRNAs (miRNAs) are major regulators of gene expression in multicellular organisms. They recognize their targets by sequence complementarity and guide them to cleavage or translational arrest. It is generally accepted that plant miRNAs have extensive complementarity to their targets and their prediction usually relies on the use of empirical parameters deduced from known miRNA–target interactions. Here, we developed a strategy to identify miRNA targets which is mainly based on the conservation of the potential regulation in different species. We applied the approach to expressed sequence tags datasets from angiosperms. Using this strategy, we predicted many new interactions and experimentally validated previously unknown miRNA targets in Arabidopsis thaliana. Newly identified targets that are broadly conserved include auxin regulators, transcription factors and transporters. Some of them might participate in the same pathways as the targets known before, suggesting that some miRNAs might control different aspects of a biological process. Furthermore, this approach can be used to identify targets present in a specific group of species, and, as a proof of principle, we analyzed Solanaceae-specific targets. The presented strategy can be used alone or in combination with other approaches to find miRNA targets in plants.
  • Article
    Full-text available
    Background Plant microRNAs (miRNAs) have been revealed to play important roles in developmental control, hormone secretion, cell differentiation and proliferation, and response to environmental stresses. However, our knowledge about the regulatory mechanisms and functions of miRNAs remains very limited. The main difficulties lie in two aspects. On one hand, the number of experimentally validated miRNA targets is very limited and the predicted targets often include many false positives, which constrains us to reveal the functions of miRNAs. On the other hand, the regulation of miRNAs is known to be spatio-temporally specific, which increases the difficulty for us to understand the regulatory mechanisms of miRNAs. Description In this paper we present miRFANs, an online database for Arabidopsis thalianamiRNA function annotations. We integrated various type of datasets, including miRNA-target interactions, transcription factor (TF) and their targets, expression profiles, genomic annotations and pathways, into a comprehensive database, and developed various statistical and mining tools, together with a user-friendly web interface. For each miRNA target predicted by psRNATarget, TargetAlign and UEA target-finder, or recorded in TarBase and miRTarBase, the effect of its up-regulated or down-regulated miRNA on the expression level of the target gene is evaluated by carrying out differential expression analysis of both miRNA and targets expression profiles acquired under the same (or similar) experimental condition and in the same tissue. Moreover, each miRNA target is associated with gene ontology and pathway terms, together with the target site information and regulating miRNAs predicted by different computational methods. These associated terms may provide valuable insight for the functions of each miRNA. Conclusion First, a comprehensive collection of miRNA targets for Arabidopsis thaliana provides valuable information about the functions of plant miRNAs. Second, a highly informative miRNA-mediated genetic regulatory network is extracted from our integrative database. Third, a set of statistical and mining tools is equipped for analyzing and mining the database. And fourth, a user-friendly web interface is developed to facilitate the browsing and analysis of the collected data.
  • Article
    Full-text available
    The mechanisms by which nitrate is transported into the roots have been characterized both at physiological and molecular levels. It has been demonstrated that nitrate is taken up in an energy-dependent way by a four-component uptake machinery involving high- and low- affinity transport systems. In contrast very little is known about the physiology of nitrate transport towards different plant tissues and in particular at the leaf level. The mechanism of nitrate uptake in leaves of cucumber (Cucumis sativus L. cv. Chinese long) plants was studied and compared with that of the root. Net nitrate uptake by roots of nitrate-depleted cucumber plants proved to be substrate-inducible and biphasic showing a saturable kinetics with a clear linear non saturable component at an anion concentration higher than 2 mM. Nitrate uptake by leaf discs of cucumber plants showed some similarities with that operating in the roots (e.g. electrogenic H+ dependence via involvement of proton pump, a certain degree of induction). However, it did not exhibit typical biphasic kinetics and was characterized by a higher Km with values out of the range usually recorded in roots of several different plant species. The quantity and activity of plasma membrane (PM) H+-ATPase of the vesicles isolated from leaf tissues of nitrate-treated plants for 12 h (peak of nitrate foliar uptake rate) increased with respect to that observed in the vesicles isolated from N-deprived control plants, thus suggesting an involvement of this enzyme in the leaf nitrate uptake process similar to that described in roots. Molecular analyses suggest the involvement of a specific isoform of PM H+-ATPase (CsHA1) and NRT2 transporter (CsNRT2) in root nitrate uptake. At the leaf level, nitrate treatment modulated the expression of CsHA2, highlighting a main putative role of this isogene in the process. Obtained results provide for the first time evidence that a saturable and substrate-inducible nitrate uptake mechanism operates in cucumber leaves. Its activity appears to be related to that of PM H+-ATPase activity and in particular to the induction of CsHA2 isoform. However the question about the molecular entity responsible for the transport of nitrate into leaf cells therefore still remains unresolved.
  • Article
    Support vector machine (SVM) is a popular technique for classi�cation. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but signi�cant steps. In this guide, we propose a simple procedure, which usually gives reasonable results.
  • Article
    Full-text available
    Plant endogenous non-coding short small RNAs (20-24 nt), including microRNAs (miRNAs) and a subset of small interfering RNAs (ta-siRNAs), play important role in gene expression regulatory networks (GRNs). For example, many transcription factors and development-related genes have been reported as targets of these regulatory small RNAs. Although a number of miRNA target prediction algorithms and programs have been developed, most of them were designed for animal miRNAs which are significantly different from plant miRNAs in the target recognition process. These differences demand the development of separate plant miRNA (and ta-siRNA) target analysis tool(s). We present psRNATarget, a plant small RNA target analysis server, which features two important analysis functions: (i) reverse complementary matching between small RNA and target transcript using a proven scoring schema, and (ii) target-site accessibility evaluation by calculating unpaired energy (UPE) required to 'open' secondary structure around small RNA's target site on mRNA. The psRNATarget incorporates recent discoveries in plant miRNA target recognition, e.g. it distinguishes translational and post-transcriptional inhibition, and it reports the number of small RNA/target site pairs that may affect small RNA binding activity to target transcript. The psRNATarget server is designed for high-throughput analysis of next-generation data with an efficient distributed computing back-end pipeline that runs on a Linux cluster. The server front-end integrates three simplified user-friendly interfaces to accept user-submitted or preloaded small RNAs and transcript sequences; and outputs a comprehensive list of small RNA/target pairs along with the online tools for batch downloading, key word searching and results sorting. The psRNATarget server is freely available at http://plantgrn.noble.org/psRNATarget/.
  • Article
    The goal of the Biological General Repository for Interaction Datasets (BioGRID) (http://www.thebiogrid.org) is to archive and freely disseminate collections of genetic and protein interactions from major model organisms. BioGRID currently houses over 335,000 interactions curated from high-throughput datasets and individual focused studies found in the primary literature, as derived from some 23,000 publications. Complete coverage of the entire literature for both the budding yeast _Saccharomyces cerevisiae_ and the fission yeast _Schizosaccharomyces pombe_ has been achieved, resulting in the curation of over 246,000 interactions, and efforts to expand curation across multiple species are underway. Through collaborations with the Gene Ontology (GO) Consortium and the Linking Animal Models to Human Disease Initiative (LAMHDI), we are focusing our curation efforts across model organisms on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. A dedicated Interaction Management System (IMS) is used to track all curation and to prioritize publications across multiple curation projects. BioGRID data are incorporated in several model organism databases and other biological databases. The entire BioGRID interaction collection may be downloaded in multiple file formats, including PSI MI XML, and source code for BioGRID is freely available without any restrictions. This work is supported by NIH NCRR grant R01 RR024031 to MT and KD, and by grants from the CIHR and BBSRC to MT.
  • Article
    Full-text available
    Plant microRNAs (miRNA) target recognition mechanism was once thought to be simple and straightforward, i.e. through perfect reverse complementary matching; therefore, very few target prediction tools and algorithms were developed for plants as compared to those for animals. However, the discovery of transcription suppression and the more recent observation of widespread translational regulation by miRNAs highlight the enormous diversity and complexity of gene regulation in plant systems. This, in turn, necessitates the need for advanced computational tools/algorithms for comprehensive miRNA target analysis to help understand miRNA regulatory mechanisms. Yet, advanced/comprehensive plant miRNA target analysis tools are still lacking despite the desirability and importance of such tools, especially the ability of predicting translational inhibition and integrating transcriptome data. This review focuses on recent progress in plant miRNA target recognition mechanism, principles of target prediction based on these understandings, comparison of current prediction tools and algorithms for plant miRNA target analysis and the outlook for future directions in the development of plant miRNA target tools and algorithms.
  • Article
    The miRror application provides insights on microRNA (miRNA) regulation. It is based on the notion of a combinatorial regulation by an ensemble of miRNAs or genes. miRror integrates predictions from a dozen of miRNA resources that are based on complementary algorithms into a unified statistical framework. For miRNAs set as input, the online tool provides a ranked list of targets, based on set of resources selected by the user, according to their significance of being coordinately regulated. Symmetrically, a set of genes can be used as input to suggest a set of miRNAs. The user can restrict the analysis for the preferred tissue or cell line. miRror is suitable for analyzing results from miRNAs profiling, proteomics and gene expression arrays. Availability: http://www.proto.cs.huji.ac.il/mirror
  • Article
    microRNAs are short RNA fragments that have the capacity of regulating hundreds of target gene expression. Currently, due to lack of high-throughput experimental methods for miRNA target identification, a collection of computational target prediction approaches have been developed. However, these approaches deal with different features or factors are weighted differently resulting in diverse range of predictions. The prediction accuracy remains uncertain. In this paper, three commonly used target prediction algorithms are evaluated and further integrated using algorithm combination, ranking aggregation and Bayesian Network classification. Our results revealed that each individual prediction algorithm displays its advantages as was shown on different test data sets. Among different integration strategies, the application of Bayesian Network classifier on the features calculated from multiple prediction methods significantly improved target prediction accuracy.
  • Article
    Full-text available
    PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations.
  • Article
    The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net/). The GO Consortium continues to improve to the vocabulary content, reflecting the impact of several novel mechanisms of incorporating community input. A growing number of model organism databases and genome annotation groups contribute annotation sets using GO terms to GO's public repository. Updates to the AmiGO browser have improved access to contributed genome annotations. As the GO project continues to grow, the use of the GO vocabularies is becoming more varied as well as more widespread. The GO project provides an ontological annotation system that enables biologists to infer knowledge from large amounts of data.
  • Article
    Full-text available
    MicroRNAs (miRNA) are approximately 21 nucleotide-long non-coding small RNAs, which function as post-transcriptional regulators in eukaryotes. miRNAs play essential roles in regulating plant growth and development. In recent years, research into the mechanism and consequences of miRNA action has made great progress. With whole genome sequence available in such plants as Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Glycine max, etc., it is desirable to develop a plant miRNA database through the integration of large amounts of information about publicly deposited miRNA data. The plant miRNA database (PMRD) integrates available plant miRNA data deposited in public databases, gleaned from the recent literature, and data generated in-house. This database contains sequence information, secondary structure, target genes, expression profiles and a genome browser. In total, there are 8433 miRNAs collected from 121 plant species in PMRD, including model plants and major crops such as Arabidopsis, rice, wheat, soybean, maize, sorghum, barley, etc. For Arabidopsis, rice, poplar, soybean, cotton, medicago and maize, we included the possible target genes for each miRNA with a predicted interaction site in the database. Furthermore, we provided miRNA expression profiles in the PMRD, including our local rice oxidative stress related microarray data (LC Sciences miRPlants_10.1) and the recently published microarray data for poplar, Arabidopsis, tomato, maize and rice. The PMRD database was constructed by open source technology utilizing a user-friendly web interface, and multiple search tools. The PMRD is freely available at http://bioinformatics.cau.edu.cn/PMRD. We expect PMRD to be a useful tool for scientists in the miRNA field in order to study the function of miRNAs and their target genes, especially in model plants and major crops.
  • Article
    Full-text available
    MicroRNAs (miRNAs) are 20–24 nt long endogenous non-coding RNAs that act as post-transcriptional regulators in metazoa and plants. Plant miRNA targets typically contain a single sequence motif with near-perfect complementarity to the miRNA. Here, we extended and applied the program RNAhybrid to identify novel miRNA targets in the complete annotated Arabidopsis thaliana transcriptome. RNAhybrid predicts the energetically most favorable miRNA:mRNA hybrids that are consistent with user-defined structural constraints. These were: (i) perfect base pairing of the duplex from nucleotide 8 to 12 counting from the 5′-end of the miRNA; (ii) loops with a maximum length of one nucleotide in either strand; (iii) bulges with no more than one nucleotide in size; and (iv) unpaired end overhangs not longer than two nucleotides. G:U base pairs are not treated as mismatches, but contribute less favorable to the overall free energy. The resulting hybrids were filtered according to their minimum free energy, resulting in an overall prediction of more than 600 novel miRNA targets. The specificity and signal-to-noise ratio of the prediction was assessed with either randomized miRNAs or randomized target sequences as negative controls. Our results are in line with recent observations that the majority of miRNA targets are not transcription factors.
  • Article
    Full-text available
    Motivation: MicroRNAs miRNAs play important roles in gene regulation and are regarded as key components in gene regulatory pathways. Systematically understanding functional roles of miRNAs is essential to define core transcriptional units regulating key biological processes. Here, we propose a method based on the probabilistic graphical model to identify the regulatory modules of miRNAs and the core regulatory motifs involved in their ability to regulate gene expression. Results: We applied our method to datasets of different sources from Arabidopsis consisting of miRNA-target pair information, upstream sequences of miRNAs, transcriptional regulatory motifs and gene expression profiles. The graphical model used in this study can efficiently capture the relationship between miRNAs and diverse conditions such as various developmental processes, thus allowing us to detect functionally correlated miRNA regulatory modules involved in specific biological processes. Furthermore, this approach can reveal core transcriptional elements associated with their miRNAs. The proposed method found clusters of miRNAs, as well as putative regulators controlling the expression of miRNAs, which were highly related to diverse developmental processes of Arabidopsis. Consequently, our method can provide hypothetical miRNA regulatory circuits for functional testing that represent transcriptional events of miRNAs and transcriptional factors involved in gene regulatory pathways.
  • Article
    Full-text available
    Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.
  • Article
    Full-text available
    The recent discoveries of microRNA (miRNA) genes and characterization of the first few target genes regulated by miRNAs in Caenorhabditis elegans and Drosophila melanogaster have set the stage for elucidation of a novel network of regulatory control. We present a computational method for whole-genome prediction of miRNA target genes. The method is validated using known examples. For each miRNA, target genes are selected on the basis of three properties: sequence complementarity using a position-weighted local alignment algorithm, free energies of RNA-RNA duplexes, and conservation of target sites in related genomes. Application to the D. melanogaster, Drosophila pseudoobscura and Anopheles gambiae genomes identifies several hundred target genes potentially regulated by one or more known miRNAs. These potential targets are rich in genes that are expressed at specific developmental stages and that are involved in cell fate specification, morphogenesis and the coordination of developmental processes, as well as genes that are active in the mature nervous system. High-ranking target genes are enriched in transcription factors two-fold and include genes already known to be under translational regulation. Our results reaffirm the thesis that miRNAs have an important role in establishing the complex spatial and temporal patterns of gene activity necessary for the orderly progression of development and suggest additional roles in the function of the mature organism. In addition the results point the way to directed experiments to determine miRNA functions. The emerging combinatorics of miRNA target sites in the 3' untranslated regions of messenger RNAs are reminiscent of transcriptional regulation in promoter regions of DNA, with both one-to-many and many-to-one relationships between regulator and target. Typically, more than one miRNA regulates one message, indicative of cooperative translational control. Conversely, one miRNA may have several target genes, reflecting target multiplicity. As a guide to focused experiments, we provide detailed online information about likely target genes and binding sites in their untranslated regions, organized by miRNA or by gene and ranked by likelihood of match. The target prediction algorithm is freely available and can be applied to whole genome sequences using identified miRNA sequences.
  • Article
    MicroRNAs (miRNAs) are endogenous approximately 22 nt RNAs that can play important regulatory roles in animals and plants by targeting mRNAs for cleavage or translational repression. Although they escaped notice until relatively recently, miRNAs comprise one of the more abundant classes of gene regulatory molecules in multicellular organisms and likely influence the output of many protein-coding genes.
  • Article
    MicroRNAs (miRNAs) and short interfering RNAs (siRNAs) are small noncoding RNAs that have recently emerged as important regulators of mRNA degradation, translational repression, and chromatin modification. In Arabidopsis thaliana, 43 miRNAs comprising 15 families have been reported thus far. In an attempt to identify novel and abiotic stress regulated miRNAs and siRNAs, we constructed a library of small RNAs from Arabidopsis seedlings exposed to dehydration, salinity, or cold stress or to the plant stress hormone abscisic acid. Sequencing of the library and subsequent analysis revealed 26 new miRNAs from 34 loci, forming 15 new families. Two of the new miRNAs from three loci are members of previously reported miR171 and miR319 families. Some of the miRNAs are preferentially expressed in specific tissues, and several are either upregulated or downregulated by abiotic stresses. Ten of the miRNAs are highly conserved in other plant species. Fifty-one potential targets with diverse function were predicted for the newly identified miRNAs based on sequence complementarity. In addition to miRNAs, we identified 102 other novel endogenous small RNAs in Arabidopsis. These findings suggest that a large number of miRNAs and other small regulatory RNAs are encoded by the Arabidopsis genome and that some of them may play important roles in plant responses to environmental stresses as well as in development and genome maintenance.
  • Article
    Full-text available
    MicroRNAs (miRNAs) interact with target mRNAs at specific sites to induce cleavage of the message or inhibit translation. The specific function of most mammalian miRNAs is unknown. We have predicted target sites on the 3' untranslated regions of human gene transcripts for all currently known 218 mammalian miRNAs to facilitate focused experiments. We report about 2,000 human genes with miRNA target sites conserved in mammals and about 250 human genes conserved as targets between mammals and fish. The prediction algorithm optimizes sequence complementarity using position-specific rules and relies on strict requirements of interspecies conservation. Experimental support for the validity of the method comes from known targets and from strong enrichment of predicted targets in mRNAs associated with the fragile X mental retardation protein in mammals. This is consistent with the hypothesis that miRNAs act as sequence-specific adaptors in the interaction of ribonuclear particles with translationally regulated messages. Overrepresented groups of targets include mRNAs coding for transcription factors, components of the miRNA machinery, and other proteins involved in translational regulation, as well as components of the ubiquitin machinery, representing novel feedback loops in gene regulation. Detailed information about target genes, target processes, and open-source software for target prediction (miRanda) is available at http://www.microrna.org. Our analysis suggests that miRNA genes, which are about 1% of all human genes, regulate protein production for 10% or more of all human genes.
  • Article
    Full-text available
    Eukaryotes produce functionally diverse classes of small RNAs (20–25 nt). These include microRNAs (miRNAs), which act as regulatory factors during growth and development, and short-interfering RNAs (siRNAs), which function in several epigenetic and post-transcriptional silencing systems. The Arabidopsis Small RNA Project (ASRP) seeks to characterize and functionally analyze the major classes of endogenous small RNAs in plants. The ASRP database provides a repository for sequences of small RNAs cloned from various Arabidopsis genotypes and tissues. Version 3.0 of the database contains 1920 unique sequences, with tools to assist in miRNA and siRNA identification and analysis. The comprehensive database is publicly available through a web interface at http://asrp.cgrb.oregonstate.edu.
  • Article
    We predict regulatory targets of vertebrate microRNAs (miRNAs) by identifying mRNAs with conserved complementarity to the seed (nucleotides 2-7) of the miRNA. An overrepresentation of conserved adenosines flanking the seed complementary sites in mRNAs indicates that primary sequence determinants can supplement base pairing to specify miRNA target recognition. In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of our gene set. Targeting was also detected in open reading frames. In sum, well over one third of human genes appear to be conserved miRNA targets.
  • Article
    Full-text available
    MicroRNAs (miRNAs) are small endogenous RNAs that can play important regulatory roles via the RNA-interference pathway by targeting mRNAs for cleavage or translational repression. We propose a computational method to predict miRNA regulatory modules (MRMs) or groups of miRNAs and target genes that are believed to participate cooperatively in post-transcriptional gene regulation. We tested our method with the human genes and miRNAs, predicting 431 MRMs. We analyze a module with genes: BTG2, WT1, PPM1D, PAK7 and RAB9B, and miRNAs: miR-15a and miR-16. Review of the literature and annotation with Gene Ontology terms reveal that the roles of these genes can indeed be closely related in specific biological processes, such as gene regulation involved in breast, renal and prostate cancers. Furthermore, it has been reported that miR-15a and miR-16 are deleted together in certain types of cancer, suggesting a possible connection between these miRNAs and cancers. Given that most known functionalities of miRNAs are related to negative gene regulation, extending our approach and exploiting the insight thus obtained may provide clues to achieving practical accuracy in the reverse-engineering of gene regulatory networks. A list of predicted modules is available from the authors upon request.
  • Article
    Full-text available
    The TRANSFAC® database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel® on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of Match™ and Patch™ provides increased functionality for TRANSFAC®. The list of databases which are linked to the common GENE table of TRANSFAC® and TRANSCompel® has been extended by: Ensembl, UniGene, EntrezGene, HumanPSD™ and TRANSPRO™. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel® contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC®, in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC® 7.0 and TRANSCompel® 7.0, are accessible under http://www.gene-regulation.com/pub/databases.html.
  • Article
    Full-text available
    To be effective in vivo, antisense oligonucleotides (AS ON) should be nuclease resistant, form stable ON/RNA duplexes and support ribonuclease H mediated heteroduplex cleavage, all with negligible non-specific effects on cell function. We report herein that AS ONs containing a 2′-deoxy-2′-fluoro-β-d-arabinonucleic acid (2′F-ANA) sugar modification not only meet these criteria, but have the added advantage of maintaining high intracellular concentrations for prolonged periods of time which appears to promote longer term gene silencing. To demonstrate this, we targeted the c-MYB protooncogene's mRNA in human leukemia cells with fully phosphorothioated 2′F-ANA–DNA chimeras (PS-2′FANA–DNA) and compared their gene silencing efficiency with AS ON containing unmodified nucleosides (PS-DNA). When delivered by nucleofection, chemically modified ON of both types effected a >90% knockdown of c-MYB mRNA and protein expression, but the PS-2′F-ANA–DNA were able to accomplish this at 20% of the dose of the PS-DNA, and in contrast to the PS-AS DNA, their silencing effect was still present after 4 days after a single administration. Therefore, our data demonstrate that PS-2′F-ANA–DNA chimeras are efficient gene silencing molecules, and suggest that they could have significant therapeutic potential.
  • Article
    Full-text available
    In the elucidation of the microRNA regulatory network, knowledge of potential targets is of highest importance. Among existing target prediction methods, RNAhybrid [M. Rehmsmeier, P. Steffen, M. Höchsmann and R. Giegerich (2004) RNA, 10, 1507–1517] is unique in offering a flexible online prediction. Recently, some useful features have been added, among these the possibility to disallow G:U base pairs in the seed region, and a seed-match speed-up, which accelerates the program by a factor of 8. In addition, the program can now be used as a webservice for remote calls from user-implemented programs. We demonstrate RNAhybrid's flexibility with the prediction of a non-canonical target site for Caenorhabditis elegans miR-241 in the 3′-untranslated region of lin-39. RNAhybrid is available at http://bibiserv.techfak.uni-bielefeld.de/rnahybrid.
  • Article
    Full-text available
    MicroRNAs (miRNAs) are small noncoding RNAs, which play significant roles as posttranscriptional regulators. The functions of animal miRNAs are generally based on complementarity for their 5' components. Although several computational miRNA target-gene prediction methods have been proposed, they still have limitations in revealing actual target genes. We implemented miTarget, a support vector machine (SVM) classifier for miRNA target gene prediction. It uses a radial basis function kernel as a similarity measure for SVM features, categorized by structural, thermodynamic, and position-based features. The latter features are introduced in this study for the first time and reflect the mechanism of miRNA binding. The SVM classifier produces high performance with a biologically relevant data set obtained from the literature, compared with previous tools. We predicted significant functions for human miR-1, miR-124a, and miR-373 using Gene Ontology (GO) analysis and revealed the importance of pairing at positions 4, 5, and 6 in the 5' region of a miRNA from a feature selection experiment. We also provide a web interface for the program. miTarget is a reliable miRNA target gene prediction tool and is a successful application of an SVM classifier. Compared with previous tools, its predictions are meaningful by GO analysis and its performance can be improved given more training examples.
  • Article
    Full-text available
    Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder. We derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan proteomes. Sequences are initially scanned for the lowest-probability subsequences (LPSs) for single amino-acid types; subsequently, an exhaustive search for lowest probability subsequences (LPSs) for multiple residue types is performed iteratively until convergence, to define CB region boundaries. We analysed > 40,000 CB regions with > 20 million residues; strikingly, nine single-/double- residue biases are universally abundant, and are consistently highly ranked across both vertebrates and invertebrates. To home in subpopulations of CB regions of interest in human and D. melanogaster, we analysed CB region lengths, conservation, inferred functional categories and predicted protein disorder, and filtered for coiled coils and protein structures. In particular, we found that some of the universally abundant CB regions have significant associations to transcription and nuclear localization in Human and Drosophila, and are also predicted to be moderately or highly disordered. Focussing on Q-based biased regions, we found that these regions are typically only well conserved within mammals (appearing in 60-80% of orthologs), with shorter human transcription-related CB regions being unconserved outside of mammals; they are also preferentially linked to protein domains such as the homeodomain and glucocorticoid-receptor DNA-binding domain. In general, only approximately 40-50% of residues in these human and Drosophila CB regions have predicted protein disorder. This data is of use for the further functional characterization of genes, and for structural genomics initiatives.
  • Article
    Full-text available
    MicroRNAs are key regulators of gene expression, but the precise mechanisms underlying their interaction with their mRNA targets are still poorly understood. Here, we systematically investigate the role of target-site accessibility, as determined by base-pairing interactions within the mRNA, in microRNA target recognition. We experimentally show that mutations diminishing target accessibility substantially reduce microRNA-mediated translational repression, with effects comparable to those of mutations that disrupt sequence complementarity. We devise a parameter-free model for microRNA-target interaction that computes the difference between the free energy gained from the formation of the microRNA-target duplex and the energetic cost of unpairing the target to make it accessible to the microRNA. This model explains the variability in our experiments, predicts validated targets more accurately than existing algorithms, and shows that genomes accommodate site accessibility by preferentially positioning targets in highly accessible regions. Our study thus demonstrates that target accessibility is a critical factor in microRNA function.
  • Article
    Full-text available
    Motivation: Most computational methodologies for miRNA:mRNA target gene prediction use the seed segment of the miRNA and require cross-species sequence conservation in this region of the mRNA target. Methods that do not rely on conservation generate numbers of predictions, which are too large to validate. We describe a target prediction method (NBmiRTar) that does not require sequence conservation, using instead, machine learning by a naïve Bayes classifier. It generates a model from sequence and miRNA:mRNA duplex information from validated targets and artificially generated negative examples. Both the 'seed' and 'out-seed' segments of the miRNA:mRNA duplex are used for target identification. Results: The application of machine-learning techniques to the features we have used is a useful and general approach for microRNA target gene prediction. Our technique produces fewer false positive predictions and fewer target candidates to be tested. It exhibits higher sensitivity and specificity than algorithms that rely on conserved genomic regions to decrease false positive predictions.
  • Article
    Full-text available
    miRBase is the central online repository for microRNA (miRNA) nomenclature, sequence data, annotation and target prediction. The current release (10.0) contains 5071 miRNA loci from 58 species, expressing 5922 distinct mature miRNA sequences: a growth of over 2000 sequences in the past 2 years. miRBase provides a range of data to facilitate studies of miRNA genomics: all miRNAs are mapped to their genomic coordinates. Clusters of miRNA sequences in the genome are highlighted, and can be defined and retrieved with any inter-miRNA distance. The overlap of miRNA sequences with annotated transcripts, both protein- and non-coding, are described. Finally, graphical views of the locations of a wide range of genomic features in model organisms allow for the first time the prediction of the likely boundaries of many miRNA primary transcripts. miRBase is available at http://microrna.sanger.ac.uk/.
  • Article
    Full-text available
    The Biological General Repository for Interaction Datasets (BioGRID) database (http://www.thebiogrid.org) was developed to house and distribute collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for both the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations. The BioGRID provides interaction data with monthly updates to Saccharomyces Genome Database, Flybase and Entrez Gene. Source code for the BioGRID and the linked Osprey network visualization system is now freely available without restriction.
  • Article
    MicroRNAs (miRNAs) are short noncoding RNAs that are involved in the regulation of thousands of gene targets. Recent studies indicate that miRNAs are likely to be master regulators of many important biological processes. Due to their functional importance, miRNAs are under intense study at present, and many studies have been published in recent years on miRNA functional characterization. The rapid accumulation of miRNA knowledge makes it challenging to properly organize and present miRNA function data. Although several miRNA functional databases have been developed recently, this remains a major bioinformatics challenge to miRNA research community. Here, we describe a new online database system, miRDB, on miRNA target prediction and functional annotation. Flexible web search interface was developed for the retrieval of target prediction results, which were generated with a new bioinformatics algorithm we developed recently. Unlike most other miRNA databases, miRNA functional annotations in miRDB are presented with a primary focus on mature miRNAs, which are the functional carriers of miRNA-mediated gene expression regulation. In addition, a wiki editing interface was established to allow anyone with Internet access to make contributions on miRNA functional annotation. This is a new attempt to develop an interactive community-annotated miRNA functional catalog. All data stored in miRDB are freely accessible at http://mirdb.org.