Automated learning of generative models for subcellular location: Building blocks for systems biology

Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
Cytometry Part A (Impact Factor: 2.93). 12/2007; 71(12):978-90. DOI: 10.1002/cyto.a.20487
Source: PubMed


The goal of location proteomics is the systematic and comprehensive study of protein subcellular location. We have previously developed automated, quantitative methods to identify protein subcellular location families, but there have been no effective means of communicating their patterns to integrate them with other information for building cell models. We built generative models of subcellular location that are learned from a collection of images so that they not only represent the pattern, but also capture its variation from cell to cell. Our models contain three components: a nuclear model, a cell shape model and a protein-containing object model. We built models for six patterns that consist primarily of discrete structures. To validate the generated images, we showed that they are recognized with reasonable accuracy by a classifier trained on real images. We also showed that the model parameters themselves can be used as features to discriminate the classes. The models allow the synthesis of images with the expectation that they are drawn from the same underlying statistical distribution as the images used to train them. They can potentially be combined for many proteins to yield a high resolution location map in support of systems biology.

Download full-text


Available from: Robert F Murphy
  • Source
    • "The WESTPA software in turn manages ensembles of the MCell simulations, for either weighted ensemble or brute-force sampling. CellOrganizer ( is an open source tool for learning conditional generative models of cellular organization from images505152535455565758 . From these models, new cellular geometries can be generated from different parts of the " shape space " of the system. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The long-term goal of connecting scales in biological simulation can be facilitated by scale-agnostic methods. We demonstrate that the weighted ensemble (WE) strategy, initially developed for molecular simulations, applies effectively to spatially resolved cell-scale simulations. The WE approach runs an ensemble of parallel trajectories with assigned weights and uses a statistical resampling strategy of replicating and pruning trajectories to focus computational effort on difficult-to-sample regions. The method can also generate unbiased estimates of non-equilibrium and equilibrium observables, sometimes with significantly less aggregate computing time than would be possible using standard parallelization. Here, we use WE to orchestrate particle-based kinetic Monte Carlo simulations, which include spatial geometry (e.g., of organelles, plasma membrane) and biochemical interactions among mobile molecular species. We study a series of models exhibiting spatial, temporal and biochemical complexity and show that although WE has important limitations, it can achieve performance significantly exceeding standard parallel simulation-by orders of magnitude for some observables.
    Full-text · Article · Feb 2016 · PLoS Computational Biology
  • Source
    • "In the work described here, we extend the nonparametric models to 3D shapes and to the combination of cell and nuclear shapes. This eliminates the need to explicitly model the conditional dependency of one shape on the other, in contrast with the previous parametric models (Zhao and Murphy, 2007; Peng and Murphy, 2011). We also develop generative models of the dynamics of cell and nuclear shape. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Modeling cell shape variation is critical to our understanding of cell biology. Previous work has demonstrated the utility of non-rigid image registration methods for the construction of non-parametric nuclear shape models where pairwise deformation distances are measured between all shapes and are embedded into a low dimensional shape space. Using these methods we explore the relationship between cell shape and nuclear shape. We find that these are frequently dependent upon each other and use this as the motivation for the development of combined cell and nuclear shape space models, extending non-parametric cell representations to multiple component 3D cellular shapes and identifying modes of joint shape variation. We learn a first-order dynamics model to predict cell and nuclear shapes given shapes at a previous time point. We use this to determine the effects of endogenous protein tags or drugs on the shape dynamics of cell lines, and show that tagged C1QBP reduces the correlation between cell and nuclear shape. To reduce the computational cost of learning these models, we demonstrate the ability to reconstruct shape spaces using a fraction of computed pairwise distances. The open source tools provide a powerful basis for future studies of the molecular basis of cell organization.
    Preview · Article · Sep 2015 · Molecular biology of the cell
  • Source
    • "Using a different approach, another group developed a tool using machine learning from real data capable of generating the whole cell, including structures like the nucleus, proteins, cell membrane and cytoplasm components such as microtubules. Although the model was capable of extracting a very precise shape model from real image data, the model could not be described in precise mathematical terms [39]. The developed work later evolved in the development of a publically available toolbox called 'CellOrganizer' [40]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Escherichia coli is a model organism for the study of multiple biological processes, including gene expression and cellular aging. Recently, these studies started to rely on temporal single cell imaging. To support these efforts, available automated image analysis methods should be improved. One important step is their validation. Ideally, the “ground truth” of the images should be known, which is possible only in synthetic images. To simulate artificial images of E. coli cells, we are developing the ‘miSimBa’ tool (Microscopy Image Simulator of Bacterial Cells). ‘miSimBa’ simulates images that reproduce the spatial and temporal bacterial organization by modelling realistically cell morphology (shape, size and spatial arrangement), cell growth and division, cell motility and some internal functions and intracellular structures, namely, the nucleoid. This tool also incorporates image acquisition parameters that simulate illumination and the primary sources of noise.
    Full-text · Conference Paper · Feb 2015
Show more