A Framework for the Automated Analysis of Subcellular Patterns in Human Protein Atlas Images

Center for Bioimage Informatics, and Departments of Biological Sciences, Biomedical Engineering, and Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15217, USA.
Journal of Proteome Research (Impact Factor: 4.25). 07/2008; 7(6):2300-8. DOI: 10.1021/pr7007626
Source: PubMed


The systematic study of subcellular location patterns is required to fully characterize the human proteome, as subcellular location provides critical context necessary for understanding a protein's function. The analysis of tens of thousands of expressed proteins for the many cell types and cellular conditions under which they may be found creates a need for automated subcellular pattern analysis. We therefore describe the application of automated methods, previously developed and validated by our laboratory on fluorescence micrographs of cultured cell lines, to analyze subcellular patterns in tissue images from the Human Protein Atlas. The Atlas currently contains images of over 3000 protein patterns in various human tissues obtained using immunohistochemistry. We chose a 16 protein subset from the Atlas that reflects the major classes of subcellular location. We then separated DNA and protein staining in the images, extracted various features from each image, and trained a support vector machine classifier to recognize the protein patterns. Our results show that our system can distinguish the patterns with 83% accuracy in 45 different tissues, and when only the most confident classifications are considered, this rises to 97%. These results are encouraging given that the tissues contain many different cell types organized in different manners, and that the Atlas images are of moderate resolution. The approach described is an important starting point for automatically assigning subcellular locations on a proteome-wide basis for collections of tissue images such as the Atlas.

11 Reads
  • Source
    • "studies on global subcellular location features, which can be regarded as global distribution features [6], for example, Haralick features calculated from gray-level cooccurrence matrix of image and DNA features extracted from the distance information between the relative protein and nuclear [7]; "
    [Show abstract] [Hide abstract]
    ABSTRACT: Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
    06/2014; 2014(12):429049. DOI:10.1155/2014/429049
  • Source
    • "Baseline feature sets As a baseline feature set, we used a global feature set, which includes Haralick texture features (Haralick et al., 1973), parameter-free Threshold Adjacency Statistics (Coelho et al., 2010b; Hamilton et al., 2007), object and skeleton features (Boland and Murphy, 2001), and overlap features (Newberg and Murphy, 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on.We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. The datasets are available for download at The software was written in Python and C++ and is available under an open-source license at The code is split into a library which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address.
    Bioinformatics 07/2013; 29(18). DOI:10.1093/bioinformatics/btt392 · 4.98 Impact Factor
    • "Scoring of breast TMA spots has been done through ordinal regression methods.[9] A framework towards an automated analysis of sub-cellular patterns in human protein atlas images yielded 83% accuracy in 45 different tissues.[10] Here we describe and compare methods to automatically classify staining patterns of proteins with unknown as well as known localization. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Human Protein Atlas (HPA) is an effort to map the location of all human proteins ( It contains a large number of histological images of sections from human tissue. Tissue micro arrays (TMA) are imaged by a slide scanning microscope, and each image represents a thin slice of a tissue core with a dark brown antibody specific stain and a blue counter stain. When generating antibodies for protein profiling of the human proteome, an important step in the quality control is to compare staining patterns of different antibodies directed towards the same protein. This comparison is an ultimate control that the antibody recognizes the right protein. In this paper, we propose and evaluate different approaches for classifying sub-cellular antibody staining patterns in breast tissue samples. The proposed methods include the computation of various features including gray level co-occurrence matrix (GLCM) features, complex wavelet co-occurrence matrix (CWCM) features, and weighted neighbor distance using compound hierarchy of algorithms representing morphology (WND-CHARM)-inspired features. The extracted features are used into two different multivariate classifiers (support vector machine (SVM) and linear discriminant analysis (LDA) classifier). Before extracting features, we use color deconvolution to separate different tissue components, such as the brownly stained positive regions and the blue cellular regions, in the immuno-stained TMA images of breast tissue. We present classification results based on combinations of feature measurements. The proposed complex wavelet features and the WND-CHARM features have accuracy similar to that of a human expert. Both human experts and the proposed automated methods have difficulties discriminating between nuclear and cytoplasmic staining patterns. This is to a large extent due to mixed staining of nucleus and cytoplasm. Methods for quantification of staining patterns in histopathology have many applications, ranging from antibody quality control to tumor grading.
    03/2013; 4(Suppl):S14. DOI:10.4103/2153-3539.109881
Show more


11 Reads
Available from