Berk Hess

KTH Royal Institute of Technology, Tukholma, Stockholm, Sweden

Are you Berk Hess?

Claim your profile

Publications (53)156.45 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics package primarily designed for simulations of proteins, lipids and nucleic acids. It was originally developed in the Biophysical Chemistry department of University of Groningen, and is now maintained by contributors in universities and research centers across the world.
    5 10/2014; Royal Institute of Technology and Uppsala University, Sweden..
  • [Show abstract] [Hide abstract]
    ABSTRACT: The accuracy of electrostatic interactions in molecular dynamics advanced tremendously with the introduction of particle-mesh Ewald (PME) summation almost 20 years ago. Lattice summation electrostatics is now the de facto standard for most types of biomolecular simulations, and in particular, for lipid bilayers, it has been a critical improvement due to the large charges typically present in zwitterionic lipid headgroups. In contrast, Lennard-Jones interactions have continued to be handled with increasingly longer cutoffs, partly because few alternatives have been available despite significant difficulties in tuning cutoffs and parameters to reproduce lipid properties. Here, we present a new Lennard-Jones PME implementation applied to lipid bilayers. We confirm that long-range contributions are well approximated by dispersion corrections in simple systems such as pentadecane (which makes parameters transferable), but for inhomogeneous and anisotropic systems such as lipid bilayers there are large effects on surface tension, resulting in up to 5.5% deviations in area per lipid and order parameters—far larger than many differences for which reparameterization has been attempted. We further propose an approximation for combination rules in reciprocal space that significantly reduces the computational cost of Lennard-Jones PME and makes accurate treatment of all nonbonded interactions competitive with simulations employing long cutoffs. These results could potentially have broad impact on important applications such as membrane proteins and free energy calculations.
    Journal of Chemical Theory and Computation 07/2013; 9(8):3527–3537. DOI:10.1021/ct400140n · 5.31 Impact Factor
  • Source
    Szilárd Páll, Berk Hess
    [Show abstract] [Hide abstract]
    ABSTRACT: Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have traditionally worked well, especially since compilers usually do a good job of unrolling the inner loop. In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD) parallelization has become essential. Avoiding memory bottlenecks is also increasingly important and requires reducing the ratio of memory to arithmetic operations. Moreover, when pairs only interact within a certain cut-off distance, good SIMD utilization can only be achieved by reordering input and output data, which quickly becomes a limiting factor. Here we present an algorithm for SIMD parallelization based on grouping a fixed number of particles, e.g. 2, 4, or 8, into spatial clusters. Calculating all interactions between particles in a pair of such clusters improves data reuse compared to the traditional scheme and results in a more efficient SIMD parallelization. Adjusting the cluster size allows the algorithm to map to SIMD units of various widths. This flexibility not only enables fast and efficient implementation on current CPUs and accelerator architectures like GPUs or Intel MIC, but it also makes the algorithm future-proof. We present the algorithm with an application to molecular dynamics simulations, where we can also make use of the effective buffering the method introduces.
    Computer Physics Communications 06/2013; 184(12). DOI:10.1016/j.cpc.2013.06.003 · 2.41 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The training event includes tutorials on basic and advanced usage of two major packages for Molecular Dynamics simulations – GROMACS and AMBER – with focus on their application to modelling of biomolecular systems. The following sessions will include the presentation of two portals for automated submission of jobs developed by the WeNMR (wenmr.eu) and ScalaLife (scalalife.eu).
    EGI COMMUNITY FORUM 2013; 04/2013
  • Article: GROMACS 4.5
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information:Supplementary data are available at Bioinformatics online.
    Bioinformatics 04/2013; 29(7):845-854. · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ScalaLife project (Scalable Software Services for Life Science) started in September 2010 and develops new hierarchical parallelization approaches explicitly based on ensemble and high-throughput computing for new multi-core and streaming/GPU architectures, establishes open software standards for data storage and exchange. The project implements, documents, and maintains such techniques in pilot European open-source codes such as the widely used GROMACS & DALTON, as well as a new application for ensemble simulation (DISCRETE). ScalaLife created a Competence Centre for scalable life science software to strengthen Europe as a major software provider and to enable the community to exploit e-Infrastructures to their full extent. This Competence Network provides training and support infrastructure, and establishes a long-term framework for maintenance and optimization of life science codes.
  • [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. RESULTS: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. AVAILABILITY: GROMACS is an open source and free software available from http://www.gromacs.org. CONTACT: erik.lindahl@scilifelab.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 02/2013; 29(7). DOI:10.1093/bioinformatics/btt055 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cellular lipid membranes are spatially inhomogeneous soft materials. Materials properties such as pressure and surface tension thus show important microscopic-scale variation that is critical to many biological functions. We present a means to calculate pressure and surface tension in a 3D-resolved manner within molecular-dynamics simulations and show how such measurements can yield important insight. We also present the first corrections to local virial and pressure fields to account for the constraints typically used in lipid simulations that otherwise cause problems in highly oriented systems such as bilayers. Based on simulations of an asymmetric bacterial ion channel in a POPC bilayer, we demonstrate how 3D-resolved pressure can probe for both short-range and long-range effects from the protein on the membrane environment. We also show how surface tension is a sensitive metric for inter-leaflet equilibrium and can be used to detect even subtle imbalances between bilayer leaflets in a membrane-protein simulation. Since surface tension is known to modulate the function of many proteins, this effect is an important consideration for predictions of ion channel function. We outline a strategy by which our local pressure measurements, which we make available within a version of the GROMACS simulation package, may be used to design optimally equilibrated membrane-protein simulations.
    Chemistry and Physics of Lipids 01/2013; 169. DOI:10.1016/j.chemphyslip.2013.01.001 · 2.59 Impact Factor
  • Source
    01/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The gating of voltage-gated ion channels is controlled by the arginine-rich S4 helix of the voltage-sensor domain moving in response to an external potential. Recent studies have suggested that S4 moves in three to four steps to open the conducting pore, thus visiting several intermediate conformations during gating. However, the exact conformational changes are not known in detail. For instance, it has been suggested that there is a local rotation in the helix corresponding to short segments of a 3[Formula: see text]-helix moving along S4 during opening and closing. Here, we have explored the energetics of the transition between the fully open state (based on the X-ray structure) and the first intermediate state towards channel closing (C[Formula: see text]), modeled from experimental constraints. We show that conformations within 3 Å of the X-ray structure are obtained in simulations starting from the C[Formula: see text] model, and directly observe the previously suggested sliding 3[Formula: see text]-helix region in S4. Through systematic free energy calculations, we show that the C[Formula: see text] state is a stable intermediate conformation and determine free energy profiles for moving between the states without constraints. Mutations indicate several residues in a narrow hydrophobic band in the voltage sensor contribute to the barrier between the open and C[Formula: see text] states, with F233 in the S2 helix having the largest influence. Substitution for smaller amino acids reduces the transition cost, while introduction of a larger ring increases it, largely confirming experimental activation shift results. There is a systematic correlation between the local aromatic ring rotation, the arginine barrier crossing, and the corresponding relative free energy. In particular, it appears to be more advantageous for the F233 side chain to rotate towards the extracellular side when arginines cross the hydrophobic region.
    PLoS ONE 10/2012; 7(10):e45880. DOI:10.1371/journal.pone.0045880 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Life Sciences have rapidly become one of the major beneficiaries of the European e-Infrastructures, placing a growing demand on the capabilities of simulation software and on the support services. The ScalaLife project has set to address some of the specific problems associated with this growth, acting along two distinct and complementary directions. On the one hand, the project is concerned with the discrepancy between the scalability advances made by e-Infrastructure projects such as PRACE/DEISA on large molecular systems and the reality of the typical Life Science simulation, which works predominantly with small-to-medium systems. Thus, ScalaLife is implementing new techniques for efficient small-to-medium system parallelisation, developing new hierarchical approaches (explicitly based on ensemble and high-throughput computing for new multi-core and streaming/GPU architectures) and establishing open software standards for data storage and exchange. On the other hand, the project is committed to the long-term support of the Life Science users and communities, providing both training and expert advice. First, ScalaLife is documenting and developing training material for the new techniques and data storage formats implemented by the project. Second, the project has created a prototype for a cross-disciplinary Competence Centre, which enables the Life Science community to exploit the key European applications developed as part of the project as well as the existing European e-Infrastructures effectively. By providing a training and support infrastructure and by developing a centre of excellence and associated policies to foster collaboration, the Competence Centre establishes a long-term structure for the maintenance and optimisation of Life Science software.
    eChallenges e-2012, Lisbon, Portugal; 10/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The work aims at evaluating the performance of GROMACS on different platforms and and determine the optimal set of conditions for given architectures for petascaling molecular dynamics simulations. The activities have been organized into three tasks within PRACE project: (i) Optimization of GROMACS performance on Blue Gene systems; (ii) Parallel scaling of the OpenMP implementation; (iii) Development of a multiple step-size symplectic integrator adapted to the large biomolecule systems. Part of the results reported here has been achieved through the collaboration with ScalaLife project.
  • Biophysical Journal 01/2012; 102(3):13-. DOI:10.1016/j.bpj.2011.11.096 · 3.83 Impact Factor
  • David van der Spoel, Berk Hess
    [Show abstract] [Hide abstract]
    ABSTRACT: Molecular dynamics (MD) simulations form a powerful tool that is complementary to experiments and theory. They allow detailed investigations of both biological and chemical systems at the atomic level at timescales ranging from femtoseconds to milliseconds. Mechanisms and processes not accessible to experimental techniques can be followed in ‘real time’, and hypotheses based on experiments or theoretical arguments can be tested. Limits on the accuracy of results are mainly due to the physical models, the ratio of the complexity of the problem and the amount of computer time. Here, we review the state of the art in MD simulations with a focus on imminent challenges for the GROMACS (GROningen MAchine for Chemical Simulation) software. New hardware puts new requirements on software, while the breadth of applications and the amount of physical models implemented are increasing rapidly, highlighting shortcomings in the architecture of the programs. We sketch a road map for a popular scientific software package and discuss some of the choices to be made. © 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 710–715 DOI: 10.1002/wcms.50
    04/2011; 1(5):710 - 715. DOI:10.1002/wcms.50
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The activation of voltage-gated ion channels is controlled by the S4 helix, with arginines every third residue. The x-ray structures are believed to reflect an open-inactivated state, and models propose combinations of translation, rotation, and tilt to reach the resting state. Recently, experiments and simulations have independently observed occurrence of 3(10)-helix in S4. This suggests S4 might make a transition from α- to 3(10)-helix in the gating process. Here, we show 3(10)-helix structure between Q1 and R3 in the S4 segment of a voltage sensor appears to facilitate the early stage of the motion toward a down state. We use multiple microsecond-steered molecular simulations to calculate the work required for translating S4 both as α-helix and transformed to 3(10)-helix. The barrier appears to be caused by salt-bridge reformation simultaneous to R4 passing the F233 hydrophobic lock, and it is almost a factor-two lower with 3(10)-helix. The latter facilitates translation because R2/R3 line up to face E183/E226, which reduces the requirement to rotate S4. This is also reflected in a lower root mean-square deviation distortion of the rest of the voltage sensor. This supports the 3(10) hypothesis, and could explain some of the differences between the open-inactivated- versus activated-states.
    Biophysical Journal 03/2011; 100(6):1446-54. DOI:10.1016/j.bpj.2011.02.003 · 3.83 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We performed molecular simulations to study ion pairing in aqueous solutions. Our results indicate that ion specific interactions of Li(+), Na(+), and K(+) with the dimethyl phosphate anion are solvent-mediated. The same mechanism applies to carboxylate ions, as has been illustrated in earlier simulations of aqueous alkali acetate solutions. Contact ion pairs play only a minor role--or no role at all--in determining the solution structure and ion specific thermodynamics of these systems. On the basis of the Kirkwood-Buff theory of solution we furthermore show that the well-known reversal of the Hofmeister series of salt activity coefficients, comparing chloride or bromide with dimethyl phosphate or acetate, is caused by changing from a contact pairing mechanism in the former system to a solvent-mediated interaction mechanism in the latter system.
    The Journal of Physical Chemistry B 03/2011; 115(13):3734-9. DOI:10.1021/jp201150q · 3.38 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: High-performance implementations of molecular dynamics (MD) simulations play an important role in the study of macromolecules. Recent advances in both hardware and simulation software have extended the accessible time scales significantly, but the more complex algorithms used in many codes today occasionally make it difficult to understand the program flow and data structures without at least some knowledge about the underlying ideas used to improve performance. In this review, we discuss some of the currently most important areas of algorithm improvement to accelerate MD, including floating-point maths, techniques to accelerate nonbonded interactions, and methods to allow multiple or extended time steps. There is also a strong trend of increased parallelization on different levels, including both distributed memory domain decomposition, stream processing algorithms running, e.g., on graphics processing units hardware, and last but not least techniques to decouple simulations to enable massive parallelism on next-generation supercomputers or distributed computing. We describe some of the impacts these algorithms are having in current performance, and also how we believe they can be combined in the future. © 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 93–108 DOI: 10.1002/wcms.3
    01/2011; 1(1):93 - 108. DOI:10.1002/wcms.3
  • [Show abstract] [Hide abstract]
    ABSTRACT: Several GPU-based algorithms have been developed to accelerate biomolecular simulations, but although they provide benefits over single-core implementations, they have not been able to surpass the performance of state-of-the art SIMD CPU implementations (e.g. GROMACS), not to mention efficient scaling. Here, we present a heterogenous parallelization that utilizes both CPU and GPU resources efficiently. A novel fixed-particle-number sub-cell algorithm for non-bonded force calculation was developed. The algorithm uses the SIMD width as algorithmic work unit, it is intrinsically future-proof since it can be adapted to future hardware. The CUDA non-bonded kernel implementation achieves up to 60\% work-efficiency, 1.5 IPC, and 95\% L1 cache utilization. On the CPU OpenMP-parallelized SSE-accelerated code runs overlapping with GPU execution. Fully automated dynamic inter-process as well as CPU-GPU load balancing is employed. We achieve threefold speedup compared to equivalent GROMACS CPU code and show good strong and weak scaling. To the best of our knowledge this the fastest GPU molecular dynamics implementation presented to date.
    Conference on High Performance Computing Networking, Storage and Analysis - Companion Volume, SC 2011, Seattle, WA, USA, November 12-18, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biomolecular simulation is a core application on supercomputers, but it is exceptionally difficult to achieve the strong scaling necessary to reach biologically relevant timescales. Here, we present a new paradigm for parallel adaptive molecular dynamics and a publicly available implementation: Copernicus. This framework combines performance-leading molecular dynamics parallelized on three levels (SIMD, threads, and message-passing) with kinetic clustering, statistical model building and real-time result monitoring. Copernicus enables execution as single parallel jobs with automatic resource allocation. Even for a small protein such as villin (9,864 atoms), Copernicus exhibits near-linear strong scaling from 1 to 5,376 AMD cores. Starting from extended chains we observe structures 0.6 Å from the native state within 30h, and achieve sufficient sampling to predict the native state without a priori knowledge after 80--90h. To match Copernicus' efficiency, a classical simulation would have to exceed 50 microseconds per day, currently infeasible even with custom hardware designed for simulations.
    Conference on High Performance Computing Networking, Storage and Analysis, SC 2011, Seattle, WA, USA, November 12-18, 2011; 01/2011

Publication Stats

12k Citations
156.45 Total Impact Points

Institutions

  • 2011–2013
    • KTH Royal Institute of Technology
      • Department of Theoretical Physics
      Tukholma, Stockholm, Sweden
  • 2010–2011
    • Stockholm University
      • Center for Biomembrane Research
      Tukholma, Stockholm, Sweden
  • 2005–2010
    • Max Planck Institute for Polymer Research
      Mayence, Rheinland-Pfalz, Germany
  • 2007
    • Koc University
      İstanbul, Istanbul, Turkey
  • 1998–2005
    • University of Groningen
      • • Department of Applied Physics
      • • Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
      Groningen, Province of Groningen, Netherlands