A framework for the automated analysis of subcellular patterns in human protein atlas images

Center for Bioimage Informatics, and Departments of Biological Sciences, Biomedical Engineering, and Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15217, USA.
Journal of Proteome Research (Impact Factor: 5). 07/2008; 7(6):2300-8. DOI: 10.1021/pr7007626
Source: PubMed

ABSTRACT The systematic study of subcellular location patterns is required to fully characterize the human proteome, as subcellular location provides critical context necessary for understanding a protein's function. The analysis of tens of thousands of expressed proteins for the many cell types and cellular conditions under which they may be found creates a need for automated subcellular pattern analysis. We therefore describe the application of automated methods, previously developed and validated by our laboratory on fluorescence micrographs of cultured cell lines, to analyze subcellular patterns in tissue images from the Human Protein Atlas. The Atlas currently contains images of over 3000 protein patterns in various human tissues obtained using immunohistochemistry. We chose a 16 protein subset from the Atlas that reflects the major classes of subcellular location. We then separated DNA and protein staining in the images, extracted various features from each image, and trained a support vector machine classifier to recognize the protein patterns. Our results show that our system can distinguish the patterns with 83% accuracy in 45 different tissues, and when only the most confident classifications are considered, this rises to 97%. These results are encouraging given that the tissues contain many different cell types organized in different manners, and that the Atlas images are of moderate resolution. The approach described is an important starting point for automatically assigning subcellular locations on a proteome-wide basis for collections of tissue images such as the Atlas.

  • [Show abstract] [Hide abstract]
    ABSTRACT: There is a long-term interest in the challenging task of finding translocated and mislocated cancer biomarker proteins. Bioimages of subcellular protein distribution are new data sources which have attracted much attention in recent years because of their intuitive and detailed descriptions of protein distribution. However, automated methods in large-scale biomarker screening suffer significantly from the lack of subcellular location annotations for bioimages from cancer tissues. The transfer prediction idea of applying models trained on normal tissue proteins to predict the subcellular locations of cancerous ones is arbitrary because the protein distribution patterns may differ in normal and cancerous states. We developed a new semi-supervised protocol that can use unlabeled cancer protein data in model construction by an iterative and incremental training strategy. Our approach enables us to selectively use the low-quality images in normal states to expand the training sample space and provides a general way for dealing with the small size of annotated images used together with large unannotated ones. Experiments demonstrate that the new semi-supervised protocol can result in improved accuracy and sensitivity of subcellular location difference detection. Availability and Implementation: The data and code are available at: CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author (2014). Published by Oxford University Press. All rights reserved. For Permissions, please email:
    Bioinformatics 11/2014; DOI:10.1093/bioinformatics/btu772 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Molecular biomarkers are changes measured in biological samples that reflect disease states. Such markers can help clinicians identify types of cancer or stages of progression, and they can guide in tailoring specific therapies. Many efforts to identify biomarkers consider genes that mutate between normal and cancerous tissues or changes in protein or RNA expression levels. Here we define location biomarkers, proteins that undergo changes in subcellular location that are indicative of disease. To discover such biomarkers, we have developed an automated pipeline to compare the subcellular location of proteins between two sets of immunohistochemistry images. We used the pipeline to compare images of healthy and tumor tissue from the Human Protein Atlas, ranking hundreds of proteins in breast, liver, prostate, and bladder based on how much their location was estimated to have changed. The performance of the system was evaluated by determining whether proteins previously known to change location in tumors were ranked highly. We present a number of candidate location biomarkers for each tissue, and identify biochemical pathways that are enriched in proteins that change location. The analysis technology is anticipated to be useful not only for discovering new location biomarkers but also for enabling automated analysis of biomarker distributions as an aid to determining diagnosis.
    Proceedings of the National Academy of Sciences 12/2014; DOI:10.1073/pnas.1415120112 · 9.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To identify gene dosage changes associated with nonobstructive azoospermia (NOA) using array comparative genomic hybridization (aCGH). Prospective study. Medical school. One hundred ten men with NOA and 78 fertile controls. None. The study has four distinct analytic components: aCGH, a molecular karyotype that detects copy number variations (CNVs); Taqman CNV assays to validate CNVs; mutation identification by Sanger sequencing; and histological analyses of testicular tissues. A microduplication at 20q11.22 encompassing E2F transcription factor-1 (E2F1) was identified in one of eight men with NOA analyzed using aCGH. CNVs were confirmed and in an additional 102 men with NOA screened using Taqman CNV assays, for a total of 110 NOA men analyzed for CNVs in E2F1. Eight of 110 (7.3%) NOA men had microduplications or microdeletions of E2F1 that were absent in fertile controls. E2F1 microduplications or microdeletions are present in men with NOA (7.3%). Duplications or deletions of E2F1 occur very rarely in the general population (0.011%), but E2F1 gene dosage changes, previously reported only in cancers, are present in a subset of NOA men. These results recapitulate the infertility phenotype seen in mice lacking or overexpressing E2f1. Copyright © 2014 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
    Fertility and Sterility 10/2014; DOI:10.1016/j.fertnstert.2014.09.021 · 4.30 Impact Factor


1 Download
Available from