Conference Paper

Parallel Optimization of Relion: Performance Comparison based on Cluster for CPU/GPU and KNL

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Relion is the 3D reconstruction program with the Bayes algorithm of electron cryo-microscope (cryo-EM) data. We analyzed the characteristics of the Relion program, and designed a parallelization scheme. We use the optimization methods commonly used in the code optimization for Relion programs, such as memory access optimization, multithread optimization, vectorization and conversing coarse grained parallel to fine-grained parallelism. Finally, the overall running time of the entire Relion program was reduced by 379s. At the same time, we tested the program on GPU and KNL platform and compared the results of the Relion program on the KNL cluster platform and the GPU cluster platform. The results show that the optimization effect of Relion on the GPU platform is better than KNL.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
RELION, for REgularized LIkelihood OptimizatioN, is an open-source computer program for the refinement of macromolecular structures by single-particle analysis of electron cryo-microscopy (cryo-EM) data. Whereas alternative approaches often rely on user expertise for the tuning of parameters, RELION uses a Bayesian approach to infer parameters of a statistical model from the data. This paper describes developments that reduce the computational costs of the underlying maximum a posteriori (MAP) algorithm, as well as statistical considerations that yield new insights into the accuracy with which the relative orientations of individual particles may be determined. A so-called gold-standard Fourier shell correlation (FSC) procedure to prevent overfitting is also described. The resulting implementation yields high-quality reconstructions and reliable resolution estimates with minimal user intervention and at acceptable computational costs.
Article
Full-text available
Three-dimensional (3D) structure determination by single-particle analysis of cryo-electron microscopy (cryo-EM) images requires many parameters to be determined from extremely noisy data. This makes the method prone to overfitting, that is, when structures describe noise rather than signal, in particular near their resolution limit where noise levels are highest. Cryo-EM structures are typically filtered using ad hoc procedures to prevent overfitting, but the tuning of arbitrary parameters may lead to subjectivity in the results. I describe a Bayesian interpretation of cryo-EM structure determination, where smoothness in the reconstructed density is imposed through a Gaussian prior in the Fourier domain. The statistical framework dictates how data and prior knowledge should be combined, so that the optimal 3D linear filter is obtained without the need for arbitrariness and objective resolution estimates may be obtained. Application to experimental data indicates that the statistical approach yields more reliable structures than existing methods and is capable of detecting smaller classes in data sets that contain multiple different structures.
Article
Full-text available
The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
Article
By reaching near-atomic resolution for a wide range of specimens, single-particle cryo-EM structure determination is transforming structural biology. However, the necessary calculations come at large computational costs, which has introduced a bottleneck that is currently limiting throughput and the development of new methods. Here, we present an implementation of the RELION image processing software that uses graphics processors (GPUs) to address the most computationally intensive steps of its cryo-EM structure determination workflow. Both image classification and high-resolution refinement have been accelerated more than an order-of-magnitude, and template-based particle selection has been accelerated well over two orders-of-magnitude on desktop hardware. Memory requirements on GPUs have been reduced to fit widely available hardware, and we show that the use of single precision arithmetic does not adversely affect results. This enables high-resolution cryo-EM structure determination in a matter of days on a single workstation.
Chapter
Discusses the keys to effective parallel programming. While getting maximal performance from Knights Landing is largely the same challenge as with any processor, the challenge of parallel programming remains. The basics of managing parallelism at the domain, thread, data, and locality levels are discussed. The provocative “To Refactor, or Not to Refactor” question is examined.
Article
This book is required reading for anyone working with accelerator-based computing systems. From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is requiredjust the ability to program in a modestly extended version of C. CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. Youll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance. Major topics covered include Parallel programming Thread cooperation Constant memory and events Texture memory Graphics interoperability Atomics Streams CUDA C on multiple GPUs Advanced atomics Additional CUDA resources All the CUDA software tools youll need are freely available for download from NVIDIA.http://developer.nvidia.com/object/cuda-by-example.html
Article
Interpretation of the structural information in cryomicroscopy images recorded on film or CCD camera requires a precise knowledge of the electron microscope parameters that affect image features such as magnification and defocus. Magnification must be determined in order to combine data from different images in a three-dimensional reconstruction and to accurately scale reconstructions for fitting with atomic resolution models. A method is described for estimating the absolute magnification of an electron micrograph of a frozen-hydrated specimen using horse spleen apoferritin as a standard. Apoferritin is a widely available protein complex of known structure that may be included with the specimen of interest and imaged under conditions identical to those used for imaging other biological specimens by cryomicroscopy. The sum of the structure factor intensities of images of randomly-oriented apoferritin particles shows three low resolution peaks to 25Å that arise from the hollow ball structure of apoferritin. Comparison of peak positions of the experimental intensities with structure factor intensities of an atomic model of apoferritin determined by X-ray crystallography provides a scale factor for estimating the absolute magnification of the micrograph. We compare the magnification estimate using apoferritin to that obtained with tobacco mosaic virus, another common magnification standard for cryomicroscopy. We verify the precision of the method by acquiring images with a systematic variation of magnification.
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Article
The article aims to introduce computer scientists to the new field of bioinformatics. This area has arisen from the needs of biologists to utilize and help interpret the vast amounts of data that are constantly being gathered in genomic research---and its more recent counterparts, proteomics and functional genomics. The ultimate goal of bioinformatics is to develop in silico models that will complement in vitro and in vivo biological experiments. The article provides a bird's eye view of the basic concepts in molecular cell biology, outlines the nature of the existing data, and describes the kind of computer algorithms and techniques that are necessary to understand cell behavior. The underlying motivation for many of the bioinformatics approaches is the evolution of organisms and the complexity of working with incomplete and noisy data. The topics covered include: descriptions of the current software especially developed for biologists, computer and mathematical cell models, and areas of computer science that play an important role in bioinformatics.
Article
Technical advances on several frontiers have expanded the applicability of existing methods in structural biology and helped close the resolution gaps between them. As a result, we are now poised to integrate structural information gathered at multiple levels of the biological hierarchy - from atoms to cells - into a common framework. The goal is a comprehensive description of the multitude of interactions between molecular entities, which in turn is a prerequisite for the discovery of general structural principles that underlie all cellular processes.
Article
Cytoplasmic polyhedrosis virus (CPV) is unique within the Reoviridae family in having a turreted single-layer capsid contained within polyhedrin inclusion bodies, yet being fully capable of cell entry and endogenous RNA transcription. Biochemical data have shown that the amino-terminal 79 residues of the CPV turret protein (TP) is sufficient to bring CPV or engineered proteins into the polyhedrin matrix for micro-encapsulation. Here we report the three-dimensional structure of CPV at 3.88 A resolution using single-particle cryo-electron microscopy. Our map clearly shows the turns and deep grooves of alpha-helices, the strand separation in beta-sheets, and densities for loops and many bulky side chains; thus permitting atomic model-building effort from cryo-electron microscopy maps. We observed a helix-to-beta-hairpin conformational change between the two conformational states of the capsid shell protein in the region directly interacting with genomic RNA. We have also discovered a messenger RNA release hole coupled with the mRNA capping machinery unique to CPV. Furthermore, we have identified the polyhedrin-binding domain, a structure that has potential in nanobiotechnology applications.
The Era of Cross-Disciplinary Research for Medical Advances Is Coming--Briskly A Conversation with JOACHIM FRANK
  • J Cheng
  • Z Fu
  • K Qin
2nd Generation Intel® Xeon Phi processor{C}// Hot Chips 27 Symposium
  • Sodani A.