Toward a Confocal Subcellular Atlas of the Human Proteome

Department of Biotechnology, AlbaNova University Center, Royal Institute of Technology, SE-106 91 Stockholm, Sweden.
Molecular &amp Cellular Proteomics (Impact Factor: 7.25). 04/2008; 7(3):499-508. DOI: 10.1074/mcp.M700325-MCP200
Source: PubMed

ABSTRACT Information on protein localization on the subcellular level is important to map and characterize the proteome and to better understand cellular functions of proteins. Here we report on a pilot study of 466 proteins in three human cell lines aimed to allow large scale confocal microscopy analysis using protein-specific antibodies. Approximately 3000 high resolution images were generated, and more than 80% of the analyzed proteins could be classified in one or multiple subcellular compartment(s). The localizations of the proteins showed, in many cases, good agreement with the Gene Ontology localization prediction model. This is the first large scale antibody-based study to localize proteins into subcellular compartments using antibodies and confocal microscopy. The results suggest that this approach might be a valuable tool in conjunction with predictive models for protein localization.

Download full-text


Available from: Hjalmar Brismar, Aug 26, 2015
1 Follower
  • Source
    • "Table 2. Dataset statistics Name Number of images Number of classes Reference RT-widefield 1382 10 RT-confocal 304 10 HeLa2D 862 10 Boland and Murphy, 2001 LOCATE-transfected 553 11 Hamilton et al., 2007 LOCATE-endogenous 502 10 Hamilton et al., 2007 Binucleate 41 2 Shamir et al., 2008a CHO 327 5 Shamir et al., 2008a Terminalbulb 970 7 Shamir et al., 2008a RNAi 200 10 Shamir et al., 2008a HPA 1842 13 Barbe et al., 2008 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on.We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. The datasets are available for download at The software was written in Python and C++ and is available under an open-source license at The code is split into a library which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address.
    Bioinformatics 07/2013; 29(18). DOI:10.1093/bioinformatics/btt392 · 4.62 Impact Factor
  • Source
    • "HPA confocal images: the HPA ( is a rich source of location proteomic data (Barbe et al., 2008). It contains confocal immunofluorescence images for multiple cell lines stained for thousands of proteins with multiple reference channels. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. Results: In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization.
    Bioinformatics 06/2012; 28(12):i32-9. DOI:10.1093/bioinformatics/bts230 · 4.62 Impact Factor
  • Source
    • "Recent advances in fluorescent microscopy coupled with automated image-based analysis methods provide rich information about the compartments to which proteins are localized in yeast (Huh et al., 2003; Chen et al., 2007) and human (Osuna et al., 2007; Barbe et al., 2008; Newberg et al., 2009). Several computational methods have been developed to predict subcellular localization by integrating sequence data with other types of high-throughput data (Chou and Shen, 2008; Horton et al., 2007; Emanuelsson et al., 2007, 2000; Nair and Rost, 2005; Scott et al., 2005; Rashid et al., 2007; Bannai et al., 2002). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from
    Journal of computational biology: a journal of computational molecular cell biology 11/2011; 18(11):1709-22. DOI:10.1089/cmb.2011.0193 · 1.67 Impact Factor
Show more