MOTIF-EM: An automated computational tool for identifying conserved regions in CryoEM structures

NIH Center for Biomedical Computation, Stanford University, Stanford, CA 94305, USA.
Bioinformatics (Impact Factor: 4.98). 06/2010; 26(12):i301-9. DOI: 10.1093/bioinformatics/btq195
Source: PubMed


We present a new, first-of-its-kind, fully automated computational tool MOTIF-EM for identifying regions or domains or motifs in cryoEM maps of large macromolecular assemblies (such as chaperonins, viruses, etc.) that remain conformationally conserved. As a by-product, regions in structures that are not conserved are revealed: this can indicate local molecular flexibility related to biological activity. MOTIF-EM takes cryoEM volumetric maps as inputs. The technique used by MOTIF-EM to detect conserved sub-structures is inspired by a recent breakthrough in 2D object recognition. The technique works by constructing rotationally invariant, low-dimensional representations of local regions in the input cryoEM maps. Correspondences are established between the reduced representations (by comparing them using a simple metric) across the input maps. The correspondences are clustered using hash tables and graph theory is used to retrieve conserved structural domains or motifs. MOTIF-EM has been used to extract conserved domains occurring in large macromolecular assembly maps, including as those of viruses P22 and epsilon 15, Ribosome 70S, GroEL, that remain structurally conserved in different functional states. Our method can also been used to build atomic models for some maps. We also used MOTIF-EM to identify the conserved folds shared among dsDNA bacteriophages HK97, Epsilon 15, and ô29, though they have low-sequence similarity.
Supplementary information: Supplementary data are available at Bioinformatics online.

Download full-text


Available from: Michael Levitt, Aug 13, 2014
  • Source
    • "To introduce an efficient way to compare the similarities between all the distinct objects, we introduce rotation-invariant feature vectors (Kazhdan et al., 2003; Saha et al., 2010; Xu et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cryo electron tomography (CryoET) produces 3D density maps of biological specimen in its near native states. Applied to small cells, cryoET produces 3D snapshots of the cellular distributions of large complexes. However, retrieving this information is non-trivial due to the low resolution and low signal-to-noise ratio in tomograms. Current pattern recognition methods identify complexes by matching known structures to the cryo electron tomogram. However, so far only a small fraction of all protein complexes have been structurally resolved. It is, therefore, of great importance to develop template-free methods for the discovery of previously unknown protein complexes in cryo electron tomograms. Here, we have developed an inference method for the template-free discovery of frequently occurring protein complexes in cryo electron tomograms. We provide a first proof-of-principle of the approach and assess its applicability using realistically simulated tomograms, allowing for the inclusion of noise and distortions due to missing wedge and electron optical factors. Our method is a step toward the template-free discovery of the shapes, abundance and spatial distributions of previously unknown macromolecular complexes in whole cell tomograms.
    Bioinformatics 07/2011; 27(13):i69-76. DOI:10.1093/bioinformatics/btr207 · 4.98 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In fitting atomic structures into cryoEM density maps of macromolecular assemblies, the cross-correlation function (CCF) is the most prevalent method of scoring the goodness-of-fit. However, there are still many possible, less studied ways of scoring fits. In this paper, we introduce four scores new to cryoEM fitting and compare their performance to three known scores. Our benchmark consists of (a) 4 protein assemblies with simulated maps at 5-20 Å resolution, including the heptameric ring of GroEL; and (b) 4 experimental maps of GroEL at ∼6-23 Å resolution with corresponding fitted atomic models. We perturb each fit 1000 times and assess each new fit with each score. The correlation between a score and the Cα RMSD of each fit from the "correctly" fitted structure shows that the CCF is one of the best scores, but in certain situations could be augmented or even replaced by other scores. For instance, our implementation of a score based on mutual information outperforms or is comparable to the CCF in almost all test cases, and our new "envelope score" works as well as the CCF at sub-nanometer resolution but is an order of magnitude faster to calculate. The results also suggest that the width of the Gaussian function used to blur the atomic structure into a density map can significantly affect the fitting process. Finally, we show that our score-testing method, when combined with the Laplacian CCF or the mutual information scores, can be used as a statistical tool for improving cryoEM density fitting.
    Journal of Structural Biology 02/2011; 174(2):333-43. DOI:10.1016/j.jsb.2011.01.012 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This Meeting Review describes the proceedings and conclusions from the inaugural meeting of the Electron Microscopy Validation Task Force organized by the Unified Data Resource for 3DEM ( and held at Rutgers University in New Brunswick, NJ on September 28 and 29, 2010. At the workshop, a group of scientists involved in collecting electron microscopy data, using the data to determine three-dimensional electron microscopy (3DEM) density maps, and building molecular models into the maps explored how to assess maps, models, and other data that are deposited into the Electron Microscopy Data Bank and Protein Data Bank public data archives. The specific recommendations resulting from the workshop aim to increase the impact of 3DEM in biology and medicine.
    Structure 02/2012; 20(2):205-14. DOI:10.1016/j.str.2011.12.014 · 5.62 Impact Factor
Show more