Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

Language Technology Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
Journal of computational biology: a journal of computational molecular cell biology (Impact Factor: 1.74). 11/2011; 18(11):1709-22. DOI: 10.1089/cmb.2011.0193
Source: PubMed


Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from

Download full-text


Available from: Robert F Murphy
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Detecting epistatic interactions plays a significant role in improving pathogenesis, prevention, diagnosis and treatment of complex human diseases. A recent study in automatic detection of epistatic interactions shows that Markov Blanket-based methods are capable of finding genetic variants strongly associated with common diseases and reducing false positives when the number of instances is large. Unfortunately, a typical dataset from genome-wide association studies consists of very limited number of examples, where current methods including Markov Blanket-based method may perform poorly. To address small sample problems, we propose a Bayesian network-based approach (bNEAT) to detect epistatic interactions. The proposed method also employs a Branch-and-Bound technique for learning. We apply the proposed method to simulated datasets based on four disease models and a real dataset. Experimental results show that our method outperforms Markov Blanket-based methods and other commonly-used methods, especially when the number of samples is small. Our results show bNEAT can obtain a strong power regardless of the number of samples and is especially suitable for detecting epistatic interactions with slight or no marginal effects. The merits of the proposed approach lie in two aspects: a suitable score for Bayesian network structure learning that can reflect higher-order epistatic interactions and a heuristic Bayesian network structure learning method.
    Full-text · Article · Jul 2011 · BMC Genomics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nuclear receptors (NRs) are members of a large superfamily of evolutionarily related DNA-binding transcription factors. They regulate diverse functions, such as homeostasis, reproduction, development and metabolism. As nuclear receptors bind small molecules that can easily be modified by drug design, and control functions associated with major diseases (e.g. cancer, osteoporosis and diabetes), they are promising pharmacological targets. According to their different action mechanisms or functions, NR superfamily has been classified into seven families: NR1 (thyroid hormone like), NR2 (HNF4-like), NR3 (estrogen like), NR4 (nerve growth factor IB-like), NR5 (fushi tarazu-F1 like), NR6 (germ cell nuclear factor like), and NR0 (knirps or DAX like). With the avalanche of protein sequences generated in the postgenomic age, Scientists are facing the following challenging problems. Given an uncharacterized protein sequence, how can we identify whether it is a nuclear receptor? If it is, what family even subfamily it belongs to? To address these problems, many cheminformatics tools have been developed for nuclear receptor prediction. The current review is mainly focused on this field, including the functions, computational methods and limitations of these tools.
    Full-text · Article · May 2013 · Current topics in medicinal chemistry
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the explosion of protein sequences generated in the postgenomic era, the gap between the number of attribute-known proteins and that of uncharacterized ones has become increasingly large. Knowing the key attributes of proteins is a shortcut for prioritizing drug targets and developing novel new drugs. Unfortunately, it is both time-consuming and costly to acquire these kinds of information by purely conducting biological experiments. Therefore, it is highly desired to develop various computational tools for fast and effectively classifying proteins according to their sequence information alone. The process of developing these high throughput tools is generally involved with the following procedures: (1) constructing benchmark datasets; (2) representing a protein sequence with a discrete numerical model; (3) developing or introducing a powerful algorithm or machine learning operator to conduct the prediction; (4) estimating the anticipated accuracy with a proper and objective test method; and (5) establishing a user-friendly web-server accessible to the public. This minireview is focused on the recent progresses in identifying the types of G-protein coupled receptors (GPCRs), subcellular localization of proteins, DNA-binding proteins and their binding sites. All these identification tools may provide very useful informations for in-depth study of drug metabolism.
    Full-text · Article · Jul 2013 · Current topics in medicinal chemistry
Show more