A Framework for the Automated Analysis of Subcellular Patterns in Human Protein Atlas Images

Center for Bioimage Informatics, and Departments of Biological Sciences, Biomedical Engineering, and Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15217, USA.
Journal of Proteome Research (Impact Factor: 4.25). 07/2008; 7(6):2300-8. DOI: 10.1021/pr7007626
Source: PubMed


The systematic study of subcellular location patterns is required to fully characterize the human proteome, as subcellular location provides critical context necessary for understanding a protein's function. The analysis of tens of thousands of expressed proteins for the many cell types and cellular conditions under which they may be found creates a need for automated subcellular pattern analysis. We therefore describe the application of automated methods, previously developed and validated by our laboratory on fluorescence micrographs of cultured cell lines, to analyze subcellular patterns in tissue images from the Human Protein Atlas. The Atlas currently contains images of over 3000 protein patterns in various human tissues obtained using immunohistochemistry. We chose a 16 protein subset from the Atlas that reflects the major classes of subcellular location. We then separated DNA and protein staining in the images, extracted various features from each image, and trained a support vector machine classifier to recognize the protein patterns. Our results show that our system can distinguish the patterns with 83% accuracy in 45 different tissues, and when only the most confident classifications are considered, this rises to 97%. These results are encouraging given that the tissues contain many different cell types organized in different manners, and that the Atlas images are of moderate resolution. The approach described is an important starting point for automatically assigning subcellular locations on a proteome-wide basis for collections of tissue images such as the Atlas.

  • Source
    • "Unsupervised stain decomposition methods usually estimate a spectrum matrix M from a query image directly. Following a color linear unmixing model for fluorescence images where color is superposition of stains in the linear RGB color space, independent component analysis or NMF is applied to an RGB-format pathology image for stain decomposition [18], [19]. However, pathology imaging follows the Beer- Lambert law, rather than the additive staining model of fluorescence imaging. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In digital pathology, to address color variation and histological component co-localization in pathology images, stain decomposition is usually performed preceding spectral normalization and tissue component segmentation. This paper examines the problem of stain decomposition, which is a naturally nonnegative matrix factorization (NMF) problem in algebra, and introduces a systematical and analytical solution consisting of a circular color analysis module and a NMF-based computation module. Unlike the paradigm of existing stain decomposition algorithms where stain proportions are computed from estimated stain spectra using a matrix inverse operation directly, the introduced solution estimates stain spectra and stain depths via probabilistic reasoning individually. Since the proposed method pays extra attentions to achromatic pixels in color analysis and stain co-occurrence in pixel clustering, it achieves consistent and reliable stain decomposition with minimum decomposition residue. Particularly, aware of the periodic and angular nature of hue, we propose the use of a circular von Mises mixture model to analyze hue distribution, and provide a complete color-based pixel soft-clustering solution to address color mixing introduced by stain overlap. This innovation combined with saturationweighted computation makes our work effective for weak stains and broad-spectrum stains. Extensive experimentation on multiple public pathology datasets suggests that our approach outperforms state-of-the-art blind stain separation methods in terms of decomposition effectiveness.
    Full-text · Article · Dec 2015 · IEEE Journal of Biomedical and Health Informatics
  • Source
    • "Considering different features will have their own advantages, a common strategy is to fuse multiple types of features. For instance, different features are concatenated as a long vector to perform the subsequent classification task (Xu, et al., 2013)(Newberg and Murphy, 2008)(Yang, et al., 2014). Intuitively, since single type of features cannot reflect all the information of a protein image, fusing multiple types of features together is expected to be a more promising way. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding (ECOC) framework. Then, we construct multiple SC-PSorter based classifiers corresponding to the columns of the ECOC codeword matrix using a multi-kernel support vector machine (SVM) classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter based classifiers via majority voting. Results: We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas (HPA) database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. Availability: The dataset and code can be downloaded from CONTACT:
    Full-text · Article · Sep 2015 · Bioinformatics
  • Source
    • "studies on global subcellular location features, which can be regarded as global distribution features [6], for example, Haralick features calculated from gray-level cooccurrence matrix of image and DNA features extracted from the distance information between the relative protein and nuclear [7]; "
    [Show abstract] [Hide abstract]
    ABSTRACT: Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
    Full-text · Article · Jun 2014
Show more