Automated learning of generative models for subcellular location: Building blocks for systems biology

Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
Cytometry Part A (Impact Factor: 2.93). 12/2007; 71(12):978-90. DOI: 10.1002/cyto.a.20487
Source: PubMed


The goal of location proteomics is the systematic and comprehensive study of protein subcellular location. We have previously developed automated, quantitative methods to identify protein subcellular location families, but there have been no effective means of communicating their patterns to integrate them with other information for building cell models. We built generative models of subcellular location that are learned from a collection of images so that they not only represent the pattern, but also capture its variation from cell to cell. Our models contain three components: a nuclear model, a cell shape model and a protein-containing object model. We built models for six patterns that consist primarily of discrete structures. To validate the generated images, we showed that they are recognized with reasonable accuracy by a classifier trained on real images. We also showed that the model parameters themselves can be used as features to discriminate the classes. The models allow the synthesis of images with the expectation that they are drawn from the same underlying statistical distribution as the images used to train them. They can potentially be combined for many proteins to yield a high resolution location map in support of systems biology.

Download full-text


Available from: Robert F Murphy,
  • Source
    • "In the work described here, we extend the nonparametric models to 3D shapes and to the combination of cell and nuclear shapes. This eliminates the need to explicitly model the conditional dependency of one shape on the other, in contrast with the previous parametric models (Zhao and Murphy, 2007; Peng and Murphy, 2011). We also develop generative models of the dynamics of cell and nuclear shape. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Modeling cell shape variation is critical to our understanding of cell biology. Previous work has demonstrated the utility of non-rigid image registration methods for the construction of non-parametric nuclear shape models where pairwise deformation distances are measured between all shapes and are embedded into a low dimensional shape space. Using these methods we explore the relationship between cell shape and nuclear shape. We find that these are frequently dependent upon each other and use this as the motivation for the development of combined cell and nuclear shape space models, extending non-parametric cell representations to multiple component 3D cellular shapes and identifying modes of joint shape variation. We learn a first-order dynamics model to predict cell and nuclear shapes given shapes at a previous time point. We use this to determine the effects of endogenous protein tags or drugs on the shape dynamics of cell lines, and show that tagged C1QBP reduces the correlation between cell and nuclear shape. To reduce the computational cost of learning these models, we demonstrate the ability to reconstruct shape spaces using a fraction of computed pairwise distances. The open source tools provide a powerful basis for future studies of the molecular basis of cell organization.
    Molecular biology of the cell 09/2015; DOI:10.1091/mbc.E15-06-0370 · 4.47 Impact Factor
  • Source
    • "Using a different approach, another group developed a tool using machine learning from real data capable of generating the whole cell, including structures like the nucleus, proteins, cell membrane and cytoplasm components such as microtubules. Although the model was capable of extracting a very precise shape model from real image data, the model could not be described in precise mathematical terms [39]. The developed work later evolved in the development of a publically available toolbox called 'CellOrganizer' [40]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Escherichia coli is a model organism for the study of multiple biological processes, including gene expression and cellular aging. Recently, these studies started to rely on temporal single cell imaging. To support these efforts, available automated image analysis methods should be improved. One important step is their validation. Ideally, the “ground truth” of the images should be known, which is possible only in synthetic images. To simulate artificial images of E. coli cells, we are developing the ‘miSimBa’ tool (Microscopy Image Simulator of Bacterial Cells). ‘miSimBa’ simulates images that reproduce the spatial and temporal bacterial organization by modelling realistically cell morphology (shape, size and spatial arrangement), cell growth and division, cell motility and some internal functions and intracellular structures, namely, the nucleoid. This tool also incorporates image acquisition parameters that simulate illumination and the primary sources of noise.
    Bioengineering (ENBENG), 2015 IEEE 4th Portuguese Meeting on, Porto; 02/2015
  • Source
    • "One advantage of level-set methods is that they provide a natural way to handle splitting and merging object contours. The 3D MLS algorithm uses k-means clustering (Jain, 1988) followed by expectation and maximization (Zhao and Murphy, 2007) to separate the foreground from the background. A level-set function is established for each connected component of the initial segmentation and level-set evolution proceeds in two stages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Stereologic cell counting has had a major impact on the field of neuroscience. A major bottleneck in stereologic cell counting is that the user must manually decide whether or not each cell is counted according to three-dimensional (3D) stereologic counting rules by visual inspection within hundreds of microscopic fields-of-view per investigated brain or brain region. Reliance on visual inspection forces stereologic cell counting to be very labor-intensive and time-consuming, and is the main reason why biased, non-stereologic two-dimensional (2D) "cell counting" approaches have remained in widespread use. We present an evaluation of the performance of modern automated cell detection and segmentation algorithms as a potential alternative to the manual approach in stereologic cell counting. The image data used in this study were 3D microscopic images of thick brain tissue sections prepared with a variety of commonly used nuclear and cytoplasmic stains. The evaluation compared the numbers and locations of cells identified unambiguously and counted exhaustively by an expert observer with those found by three automated 3D cell detection algorithms: nuclei segmentation from the FARSIGHT toolkit, nuclei segmentation by 3D multiple level set methods, and the 3D object counter plug-in for ImageJ. Of these methods, FARSIGHT performed best, with true-positive detection rates between 38 and 99% and false-positive rates from 3.6 to 82%. The results demonstrate that the current automated methods suffer from lower detection rates and higher false-positive rates than are acceptable for obtaining valid estimates of cell numbers. Thus, at present, stereologic cell counting with manual decision for object inclusion according to unbiased stereologic counting rules remains the only adequate method for unbiased cell quantification in histologic tissue sections.
    Frontiers in Neuroanatomy 05/2014; 8:27. DOI:10.3389/fnana.2014.00027 · 3.54 Impact Factor
Show more