Article

Toward a confocal subcellular atlas of the human proteome.

Department of Biotechnology, AlbaNova University Center, Royal Institute of Technology, SE-106 91 Stockholm, Sweden.
Molecular &amp Cellular Proteomics (Impact Factor: 7.25). 04/2008; 7(3):499-508. DOI: 10.1074/mcp.M700325-MCP200
Source: PubMed

ABSTRACT Information on protein localization on the subcellular level is important to map and characterize the proteome and to better understand cellular functions of proteins. Here we report on a pilot study of 466 proteins in three human cell lines aimed to allow large scale confocal microscopy analysis using protein-specific antibodies. Approximately 3000 high resolution images were generated, and more than 80% of the analyzed proteins could be classified in one or multiple subcellular compartment(s). The localizations of the proteins showed, in many cases, good agreement with the Gene Ontology localization prediction model. This is the first large scale antibody-based study to localize proteins into subcellular compartments using antibodies and confocal microscopy. The results suggest that this approach might be a valuable tool in conjunction with predictive models for protein localization.

Download full-text

Full-text

Available from: Hjalmar Brismar, Jun 24, 2015
1 Follower
 · 
147 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on.We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. The datasets are available for download at http://murphylab.web.cmu.edu/data/. The software was written in Python and C++ and is available under an open-source license at http://murphylab.web.cmu.edu/software/. The code is split into a library which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address. murphy@cmu.edu.
    Bioinformatics 07/2013; DOI:10.1093/bioinformatics/btt392 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. Results: In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization. Availability: http://murphylab.web.cmu.edu/software/. Contact: Jeff.Schneider@cs.cmu.edu, murphy@cmu.edu
    Bioinformatics 06/2012; 28(12):i32-9. DOI:10.1093/bioinformatics/bts230 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.
    Journal of computational biology: a journal of computational molecular cell biology 11/2011; 18(11):1709-22. DOI:10.1089/cmb.2011.0193 · 1.67 Impact Factor