Content uploaded by Jadran Vrabec

Author content

All content in this area was uploaded by Jadran Vrabec on Oct 21, 2017

Content may be subject to copyright.

ms2: A Molecular Simulation Tool for Thermodynamic Properties

Stephan Deubleina, Bernhard Ecklb, J¨urgen Stollb, Sergey V. Lishchukc, Gabriela Guevara-Carriona, Colin W.

Glassd, Thorsten Merkera, Martin Bernreutherd, Hans Hassea, Jadran Vrabece,∗

aLehrstuhl f¨ur Thermodynamik, Universit¨at Kaiserslautern, 67653 Kaiserslautern, Germany

bInstitut f¨ur Technische Thermodynamik und Thermische Verfahrenstechnik, Universit¨at Stuttgart, 70550 Stuttgart, Germany

cDepartment of Mathematics, University of Leicester, Leicester LE1 7RH, United Kingdom

dH¨ochstleistungsrechenzentrum Universit¨at Stuttgart (HLRS), 70550 Stuttgart, Germany

eLehrstuhl f¨ur Thermodynamik und Energietechnik, Universit¨at Paderborn, 33098 Paderborn, Germany

Abstract

This work presents the molecular simulation program ms2that is designed for the calculation of thermodynamic

properties of bulk ﬂuids in equilibrium consisting of small electro-neutral molecules. ms2features the two main

molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of

vapor-liquid equilibria of pure ﬂuids and multi-component mixtures described by rigid molecular models on the

basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and

yields numerous thermodynamic properties. To evaluate the chemical potential, Widom’s test molecule method

and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations

following the Green-Kubo formalism. ms2is designed to meet the requirements of academia and industry,

particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized

for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-

clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is

used for parallelization and ms2is therefore easily portable onto a broad range of computing platforms. Auxiliary

feature tools facilitate the interaction with the code and the interpretation of input and output ﬁles. The accuracy

and reliability of ms2has been shown for a large variety of ﬂuids in preceding work.

Keywords: Molecular simulation, molecular dynamics, Monte-Carlo, grand equilibrium method, vapor-liquid

equilibrium, transport properties

1. Program summary

Manuscript: ms2: A Molecular Simulation Program for Thermodynamic Properties

Authors: Stephan Deublein, Bernhard Eckl, J¨urgen Stoll, Sergey V. Lishchuk, Gabriela Guevara-Carrion, Colin

W. Glass, Thorsten Merker, Martin Bernreuther, Hans Hasse, Jadran Vrabec (jadran.vrabec@upb.de)

Title of program: ms2

Operating system: Unix/Linux, Windows

∗Corresponding author: Jadran Vrabec, Warburger Str. 100, 33098 Paderborn, Germany, Tel.: +49-5251/60-2421, Fax: +49-5251/60-

3522, Email: jadran.vrabec@upb.de

Preprint submitted to Elsevier March 10, 2011

Computer: The simulation program ms2is usable on a wide variety of platforms, from single processor machines

over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers, gfortran,

Intel compiler, PathScale compiler, Portland Group compiler and Sun Studio compiler)

Memory: ms2runs on single processors with 512 MB RAM. The memory demand rises with increasing number

of processors used per node and increasing number of molecules.

Distribution format: tar.gz

Keywords: Molecular simulation, molecular dynamics, Monte-Carlo, grand equilibrium method, vapor-liquid

equilibrium, transport properties, parallel algorithms

Programming language used: Fortran90

External: Message passing interface (MPI).

Classiﬁcation: 7.7, 7.9, 12

vectorised: Yes. Message Passing Interface (MPI) protocol

Scalability: Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-

Carlo simulations.

Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules:

vapor-liquid equilibria of pure ﬂuids and multi-component mixtures, thermal and caloric data as well as transport

properties.

Method of solution: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method,

Green-Kubo formalism

Restrictions: None. The system size is user-deﬁned. Typical problems addressed by ms2can be solved by

simulating systems containing typically 2000 molecules or less.

Unusual Features: Auxiliary feature tools are available for creating input ﬁles, analyzing simulation results and

visualizing molecular trajectories.

Additional comments: Sample makeﬁles for multiple operation platforms are provided.

Documentation: Documentation is provided with the installation package and is available at

http://www.ms- 2.de.

Typicalrunning time: The running time of ms2depends on the speciﬁed problem, the system size and the number

of processes used in the simulation. Running four processes on a ”Nehalem” processor, simulations calculating

vapor-liquid equilibrium data take between two and 12 hours, calculating transport properties between six and

24 hours.

2

2. Introduction

Due to the advances in computing power, methodological efﬁciency and the development of accurate force

ﬁelds, it is understood that ”molecular modeling and simulation will become a breakthrough technology that is

widely accepted in the chemical industry and applied in conjunction with other predictive methods to meet the

industry’s evolving ﬂuid property data needs” [1]. At the core of this prospect lies the sound physical basis of

molecular modeling and simulation. It allows to adequately describe structure, energetics and dynamics on the

microscopic level, which govern material properties on the macroscopic level. Therefore it provides convenient

access to thermodynamic properties, particularly if they are difﬁcult or expensive to obtain by experiment, e.g.

at high temperatures and pressures, or may not be measured at all, e.g. when new substances are not available

in sufﬁcient quantity. Furthermore, for toxic, explosive or in any other way hazardous substances, measuring

thermodynamic properties experimentally can be unfeasible even at moderate thermodynamic conditions, while

molecular modeling and simulation offers a reliable route [2].

Presently, the chemical industry extends the scope of thermodynamic models that are regularly used from phe-

nomenological equations of state (EOS) or excess Gibbs energy models to more advanced types, such as sta-

tistical EOS like SAFT [3, 4] or continuum solvation models like COSMO-RS [5, 6]. It should be noted that

all these advanced models are developed using a bottom-up approach, i.e. bridging molecular properties to the

macroscopic level. These methods mainly offer access to aggregated properties, such as the Helmholtz energy of

bulk ﬂuids and its derivatives. Regarding transport properties, they are not useful.

Molecular modeling and simulation is applicable with very few constraints as both static and dynamic thermo-

dynamic data may be calculated, be it in bulk ﬂuids or in conﬁnements. Both, equilibrium or non-equilibrium

conditions, may be studied. Additionally, detailed insight into the mechanisms on the microscopic level is pro-

vided. This versatility, however, is associated with a substantial computational effort which is orders of magni-

tude larger than needed for the methods mentioned above. Traditionally, molecular simulations were carried out

in computing centers of national institutions or universities on powerful computing equipment, which is usually

unavailable even in large chemical companies.

However, with the increase in computer power, molecular simulations become feasible even off the shelf, if suit-

able simulations programs are available. The scope of data accessible within reasonable times even for industrial

workﬂows is increasing rapidly. Using molecular modeling and simulation can therefore contribute for a reduc-

tion of process development time and production costs [7].

Molecular modeling and simulation is rewarding, because it provides reliable predictions of essentially all ther-

modynamic properties in a consistent manner [8]. Molecular models yield predictions for any property at any

condition. This is of particular interest for industrial applications, where a wide variety of properties needs to be

known.

To further stimulate this issue, the Industrial Fluid Properties Simulation Collective [1] has organized six Simula-

tion Challenges to date [8, 9, 10, 11, 12, 1]. The goal of these Simulation Challenges is to assess the capabilities

of molecular methods regarding typical industrial tasks, where classical methods are insufﬁcient.

3

The present paper presents a molecular simulation program, named ms2, which successfully competed in two

Simulation Challenges, ﬁnishing second place 2006 [11] and ﬁrst place the following year [8]. ms2is aimed at

the calculation of thermodynamic properties that are needed for applications in the chemical industry, without

requiring expert knowledge by the user. It was developed for rigid molecular models with a focus on condensed

phases, covering mixtures containing an arbitrary number of small molecular species.

The simulation program ms2is optimized for a fast execution on a broad range of computer architectures, span-

ning from single processor PCs over PC-clusters and vector computers to high end parallel machines. It is a

standalone Fortran90 code that does not require any libraries or have any other software prerequisites. ms2is

provided as a source code. It only needs to be compiled. The structure of ms2is modular and object-oriented, ex-

cept at the very core of energy and force calculations, where some structural compromises were made to achieve

a better performance.

ms2is aimed at homogeneous bulk systems in equilibrium. For VLE calculations, the coexisting phases are

simulated independently and subsequently. Thus, it is usually sufﬁcient to sample molecular systems contain-

ing in the order of 103molecules to obtain statistically reliable results [13]. Applying standard cut-off radii,

the intermolecular interactions are evaluated roughly up to the maximal distance, which is given by the edge

length of the cubic simulation volume. Therefore, spatial decomposition schemes for parallelization are not re-

warding. Instead, Plimpton’s method [14] was implemented for parallel execution of molecular dynamics (MD),

which scales reasonably well up to 16 or 32 processors, depending on the particular architecture. For Monte-

Carlo (MC), a optimal parallelization scheme was introduced, taking advantage of the stochastic nature of such

simulations. The computing resources are splitted into single runs each on one node, sampling the phase space

independently. Both parallelization methods allow for short response times1. A typical molecular simulation

of a vapor liquid equilibrium state point takes roughly six hours on a current workstation. The main expenses

for determining thermodynamic data in silico are installation and maintenance of computing equipment and/or

computing time, which is presently below US$ 0.35/CPU-hour [15].

The simulation program ms2presented here was developed in a long-time cooperation of engineers and computer

scientists. With the release of ms2, it is intended to facilitate the transfer of a state of the art and user-friendly

molecular modeling and simulation package to academic institutions and the chemical industry. The release

package consists of the simulation program ms2itself and auxiliary feature tools for setup and analysis, includ-

ing a simple 3D visualization tool to monitor the molecular trajectories. Furthermore, accurate molecular models

for more than 100 small molecules are provided.

In sections 3 and 4 of this paper, ms2and the implemented thermodynamic properties are outlined. In section 5,

the structure of the code is explained, while section 6 compares its performance to similar simulation programs

and section 7 describes the auxiliary feature tools. Section 8 of this paper draws a conclusion and offers a brief

outlook on future developments and improvements.

1Under “response time“we understand the real time difference between submissionand termination of a simulation run.

4

3. Simulation program ms2

The simulation program ms2is capable of sampling the phase space for rigid electro-neutral molecules by ap-

plying the two most fundamental molecular simulation techniques, i.e. MD and MC. MD simulations rely on

the numerical solution of Newton’s equations of motion: for a point in time, the intermolecular interactions, in

particular the resulting forces and torques, are evaluated and treated as constant for a speciﬁed time step. They

are the driving forces of the molecular motion. The displacements for the time step are calculated on that basis,

resulting in a new conﬁguration. This process is repeated in a loop. The chronologically ordered conﬁgurations

are a time discretized approximation of a molecular process. Both static and dynamic thermodynamic properties

are determined via time averages. MC simulation explores the phase space stochastically for a given molecular

system. Molecules in the simulation volume are displaced randomly. The probability of accepting the displace-

ment is chosen such that a representative set of conﬁgurations is obtained. The Markov chain of conﬁgurations

generated in this way allows for a rigorous calculation of static thermodynamic properties via ensemble averages.

3.1. Overview

The simulation program ms2allows for the determination of static and dynamic thermodynamic properties in

equilibrium. The implemented static properties are:

•Thermal and caloric properties

•Chemical potential

•Vapor-liquid equilibria

•Henry’s law constant

•Second virial coefﬁcient

The transport properties can be calculated on the ﬂy during a MD simulation with a reasonable additional com-

putational effort using the Green-Kubo formalism [16, 17]. The implemented dynamic properties are:

•Self-diffusion coefﬁcient

•Maxwell-Stefan diffusion coefﬁcient

•Shear viscosity

•Bulk viscosity

The model class that is supported by ms2covers rigid multi-center Lennard-Jones (LJ) 12-6 interaction sites

with an arbitrary number of superimposed electrostatic sites [18, 19, 20, 21]. The supported electrostatic models

are point charges, point dipoles and point quadrupoles, which can be positioned anywhere within the molecule.

Currently, ms2is designed for electro-neutral species.

The quality of thermodynamic properties calculated by molecular simulation is basically determined by two fac-

tors: ﬁrst, the employed molecular model, i.e. the force ﬁeld, which fully deﬁnes the thermodynamic properties

5

and deviates to some degree from the behavior of the real ﬂuid; second, the sampling of the phase space during

simulation, which is associated with statistical uncertainties.

Molecular models have been investigated with ms2in numerous cases in the past [22, 23, 24, 25, 26, 27, 28].

For many of these models, the geometric and electrostatic interaction parameters were passed on from ab-initio

calculations. The remaining parameters were adjusted to reproduce the vapor pressure and the saturated liquid

density of the regarded pure substance. These molecular models, combined with ms2and its analysis methods,

allow for time-efﬁcient high quality simulations.

The second factor can be inﬂuenced for a given simulation method by the number of molecules and the number

of sampled conﬁgurations. The more data, the lower the statistical uncertainties.

3.2. Methods

3.2.1. Ensembles

Following ensembles are currently supported by ms2:

•canonical ensemble (N V T ) - MD and MC

•micro-canonical ensemble (NV E) - MD

•isobaric-isothermal ensemble (N pT ) - MD and MC

•grand equilibrium method (pseudo-µV T ) - MC

Most of these ensembles are well known and widely in use [29]. Therefore, the discussion is restricted here to

the grand equilibrium method, since it was developed recently [30] and is probably new to many readers.

The grand equilibrium method is a technique to determine the VLE of pure substances or mixtures. It is a

two-step procedure, where the coexisting phases are simulated independently and subsequently. The speciﬁed

thermodynamic variables for the VLE are the temperature Tand the composition xof the liquid. In the ﬁrst step,

one NpT simulation of the liquid phase is performed at T,xand some pressure p0to determine the chemical

potentials µl

iand the partial molar volumes vl

iof all components, which corresponds to the molar volume in case

of a pure substance. If the entropic properties are determined by Widom’s test molecule method [31], both MD

or MC can be used to sample the phase space. A more advanced technique for these properties is implemented

in combination with MC, i.e. gradual insertion [32, 33] (see below). On the basis of the chemical potentials and

partial molar volumes at p0, ﬁrst order Taylor expansions can be made for the pressure dependence

µl

iT, x, p≈µl

iT, x, p0+vl

i(T, x, p0)·p−p0. (1)

Note that ms2yields µiT , x, p−µid

i(T), where µid

i(T)is the temperature dependent part of the ideal gas

contribution to the chemical potential of the pure component. µl,id

i(T)does not need to be determined for VLE

calculations, because it cancels out when Eq. (1) is equated to the corresponding expression for the vapor.

In the second step, one pseudo-µV T simulation [30] is performed for the vapor phase on the basis of Eq. (1)

that yields the saturated vapor state point of the VLE. This simulation takes place in a pseudo ensemble in the

6

sense that the speciﬁed chemical potentials are not constant, but dependent on the actual pressure of the vapor

phase. However, experience based on hundreds of systems [25, 28] shows that the vapor phase simulation rapidly

converges to the saturated vapor state point during equilibration so that effectively the equilibrium chemical

potentials are speciﬁed via the attained vapor pressure. The number of molecules varies during the vapor phase

simulation of the grand equilibrium method. Starting from a speciﬁed number of molecules in an arbitrarily

chosen gaseous state, ms2adjusts the extensive volume Vafter an equilibration period so that the vapor density

does not have to be known in advance.

The grand equilibrium method has been widely used and compares favorably to other methods used for vapor

liquid equilibrium simulations, as shown in section 6.

3.2.2. Integrators, thermostat and barostat

In ms2, two integrators are implemented to solve Newton’s equations of motion during MD simulation: Leapfrog

and Gear predictor-corrector. These integrators are well known and described in literature [29]. The leapfrog

integrator is a second order integrator that requires little computational effort, while being robust for many ap-

plications. The error of the method scales quadratically with the length of the chosen time step. The Gear

predictor-corrector integrator is implemented with ﬁfth order and is more accurate for small time steps com-

pared to the Leapfrog integration [29]. The computational demand for both integration schemes is similar as

implemented in ms2. The Gear integration, though being of higher order, is only 0.3% slower than the Leapfrog

algorithm for the total calculation of one MD time step with a molecular model composed of three LJ sites,

having three rotational degrees of freedom.

For MC, the Markov chain is generated by repeatedly disturbing the system by translational or rotational motion

of a molecule and evaluating the resulting conﬁgurations with respect to energy using Metropolis acceptance cri-

terion. The thermostat incorporated in ms2is velocity scaling. Here, the velocities are scaled such that the actual

kinetic energy matches the speciﬁed temperature. The scaling is applied equally over all molecular degrees of

freedom. The pressure is kept constant using Andersen’s barostat in MD, and random volume changes evaluated

according to Metropolis acceptance criterion in MC, respectively.

3.2.3. Basics on simulations with ms2

The basic molecular simulation techniques employed in ms2are well described in the literature [29, 34, 35] and

are thus not repeated here. All non-standard methods that are implemented in ms2are introduced in section 4

or the Supplementary Material. The following paragraph brieﬂy describes the most signiﬁcant assumptions and

techniques employed in ms2.

Simulations with ms2are performed in a quasi-inﬁnite Cartesian space, using periodic boundary conditions [36]

and the minimum image convention [37]. The molecular interactions are considered to be pairwise additive,

like in most other simulation packages. Including three- and many-body interactions leads to an increase in

computational effort while the beneﬁts are questionable. As usual, the computational effort in ms2is reduced

by the introduction of a cut-off radius, up to which the intermolecular interactions are explicitly evaluated. The

contributions of interactions with molecules beyond the cut-off radius are accounted for by correction schemes.

7

For the electrostatic models, the reaction ﬁeld method is included, while for the dispersive and repulsive interac-

tions, a homogeneous ﬂuid beyond the cut-off radius is assumed. The long range contributions are added to all

thermodynamic data that are calculated in ms2.

Many thermodynamic properties calculated with ms2are residual quantities, indicated by the superscript ”res”.

To compare them to data that are measured on the macroscopic level, the solely temperature dependent contribu-

tions of the ideal gas have to be added. These ideal properties are accessible e.g. by quantum chemical methods

and can often be found in data bases [38].

3.3. Thermodynamic properties

The simulation program ms2calculates the thermodynamic properties from the trajectories (MD) or Markov

chains (MC) on the ﬂy. The results are written to ﬁle with a speciﬁed frequency during the course of the sim-

ulation. The statistical uncertainties of all results are estimated using the block averaging method according to

Flyvbjerg and Petersen [39] and the error propagation law.

3.3.1. Density, pressure, internal energy and enthalpy

The static thermodynamic properties accessible via ms2depend on the chosen ensemble. At constant tempera-

ture and volume (N V T ensemble), e.g. the pressure is determined by the virial expression W. Other important

thermodynamic properties are accessible at constant temperature and pressure. In the N pT ensemble, the vol-

ume is a ﬂuctuating parameter that on average yields the volume corresponding to the speciﬁed temperature and

pressure. In both ensembles, the residual internal energy is the sum of all pairwise interaction energies uij and

the appropriate long range correction. The residual enthalpy is linked to the residual internal energy, pressure

and volume of the system using the thermodynamic deﬁnition.

3.3.2. Second derivatives

The implemented second derivatives vary with the employed ensemble. In the NV T ensemble, the residual

isochoric heat capacity cres

vis determined by ﬂuctuations of the residual potential energy Ures. The partial

derivativeof the potential energy with respect to the volume at constant temperature ∂U res/∂V Tis determined

by ﬂuctuations of the residual potential energy Ures and the virial W.

In the N pT ensemble, the residual isobaric heat capacity cres

p, the isothermal compressibility βTand the volume

expansivity αpare functions of ensemble ﬂuctuations [29]. The residual isobaric heat capacity cres

pis related to

ﬂuctuations of the residual enthalpy Hres and the isothermal compressibility βTis calculated from the volume

ﬂuctuations. The partial derivative of the residual enthalpy with respect to the pressure at constant temperature

∂H res /∂pTis determined by ﬂuctuations of volume and residual internal energy and the volume expansivity

αpagain to volume and residual enthalpy ﬂuctuations.

The speed of sound cis the velocity of a sound wave traveling through an elastic medium. It is deﬁned by the

isothermal compressibility βT, the volume expansivity αp, the isobaric heat capacity cpand the temperature T.

In ms2, the speed of sound is calculated both for pure components and mixtures in the N pT ensemble.

8

3.3.3. Chemical potential

The chemical potential of a given component can be separated into the temperature dependent part µid

i(T)

of the ideal gas contribution to the chemical potential of the pure component and the remaining contribution

µiT, x, p−µid

i(T). The solely temperature dependent ideal gas contribution cancels out in the calculation of

phase equilibria and is therefore not implemented in ms2. The chemical potential depends on the real substance

behavior due to the molecular interactions and can be determined with ms2using two different techniques:

Widom’s test molecule method and gradual insertion.

Widom. A conceptually straightforward approach to calculate the chemical potential of a component iwas pre-

sented by Widom [31]. It allows for a determination of the chemical potential with low computational cost, both

for pure substances and mixtures. For a three-center LJ ﬂuid, the computational demand is shown in Table 1.

The execution time for the determination of the chemical potential increases roughly linearly with the speciﬁed

number of test molecules that are sampled.

The accuracy of the calculation varies with the number of test molecules and the density of the investigated ﬂuid.

For very dense ﬂuids, the results are subject to poor statistics and the method may even fail. Within limits, lower

statistical uncertainties of the chemical potential can be achieved by inserting a large number of test molecules

into the simulation volume.

Gradual insertion. A more advanced method to determine the chemical potential, which is reliable also at very

high densities, is gradual insertion. It is brieﬂy described here, further details including all parameters are dis-

cussed in literature [32, 33, 40, 41].

Instead of inserting complete test molecules, a ﬂuctuating molecule is introduced into the simulation, which can

appear in different states of coupling with the other molecules. In the decoupled state, the ﬂuctuating molecule

does not interact at all with the other molecules, while in the fully coupled state, it acts like a real molecule of

the speciﬁed component i. Between these states, a set of partially coupled states has to be deﬁned, each with a

larger fraction of the real molecule interaction, cf. Figure 1.

The N-1 real molecules plus the ﬂuctuating molecule πlin the state lform a set of sub-ensembles, which can be

depicted by the following scheme

[N+π0]↔[N+π1]↔... ↔[N+πl]↔... ↔[N+πk−1]↔[N+πk]. (2)

To switch between neighboring sub-ensembles, an additional move is introduced in a standard MC simulation.

The probability of accepting a change of the ﬂuctuating molecule from a state of coupling lto a state of coupling

mis given by

Pacc(l→m) = min 1,ωm

ωl

exp −ψm−ψl

kBT, (3)

where ψldenotes the interaction energy of the ﬂuctuating molecule in the state lwith all other N−1real

molecules and kBTis the Boltzmann constant multiplied by the temperature. The states of coupling are weighted

by the weighting factors ωlto avoid an unbalanced sampling of the different states. If speciﬁed, the weighting

9

factors are adjusted during simulation, depending on the number of times Nsthe ﬂuctuating molecule appeared

in state l, according to

ωnew

l=ωold

l

Ns(k)

Ns(l). (4)

The fully coupled state kserves as a reference state for the weighting factors. Local relaxation around the

ﬂuctuating molecule is enhanced by biased translational and rotational moves in the vicinity of the ﬂuctuating

molecule throughout the simulation [33]. The chemical potential µiT , x, p−µid

i(T)of component iis then

determined by

µiT, x, p−µid

i(T) = kBTln Ni

V

ωk

ω0

Prob[N+π0]

Prob[N+πk], (5)

where Prob[N+π0] and Prob[N+πk] are the probabilities to observe an ensemble with the ﬂuctuating molecule

in the fully decoupled and fully coupled state, respectively.

The gradual insertion method yields good results for the chemical potential even in cases where Widom’s test

molecule method fails. Disadvantages of the method are the extended simulation time and the additional effort

needed to deﬁne the ﬂuctuating states.

3.3.4. Henry’s law constant

The solubility of solutes in solvents is characterized by the Henry’s law constant. In solvents of a given com-

position, it is solely a function of temperature. The Henry’s law constant Hiis related to the residual chemical

potential of the solute iat inﬁnite dilution µires,∞as deﬁned in e.g. [42] by

Hi=ρkBTexp µires,∞

kBT, (6)

where ρdenotes the solvent density. The residual chemical potential of a component at inﬁnite dilution can often

be calculated using Widom’s test molecule method. In this case, a simulation of the solvent is performed in the

NpT ensemble and the saturated vapor pressure of the solvent, while the solute is introduced as an additional

component of the mixture with a molar fraction of zero. Then, the solute is only added in form of test molecules

to determine its chemical potential at inﬁnite dilution. For dense phases, the chemical potential and thus the

Henry’s law constant can be calculated with the gradual insertion method, if Widom’s method fails. The number

of solute molecules in the simulation is reduced to one, representing the inﬁnitely diluted molecule.

3.3.5. Second virial coefﬁcient

The second virial coefﬁcient is related to the intermolecular potential by [43]

B=−2πZ∞

0

drij exp −uij(rij ,ωi,ωj)

kBT−1ωi,ωjr2

ij ,(7)

where the h...ibrackets indicate the average over the orientations ωiand ωjof two molecules iand jseparated

by a center of mass distance rij . The integrand, called Mayer’s f-function, is evaluated at numerous distances

in an appropriate range. ms2evaluates Mayer’s f-function by averaging over numerous random orientations at

each radius.

10

3.3.6. Transport properties

In ms2, transport properties are calculated using equilibrium MD. Here, the ﬂuctuations of a system around

its equilibrium state are evaluated as a function of time. The Green-Kubo formalism relates these microscopic

ﬂuctuations to the respective transport properties.

Diffusion coefﬁcients. The self-diffusion coefﬁcient Diis related to the mass ﬂux of single molecules within a

ﬂuid. Therefore, the relevantGreen-Kubo expression is based on the individual molecule velocity autocorrelation

function [44]. Since all molecules contribute to the self-diffusion coefﬁcient, the autocorrelation function is

averaged over all Nimolecules of component iin the ensemble to achieve better statistics. In binary mixtures,

the Maxwell-Stefan diffusion coefﬁcient −

D

ij is deﬁned by [45]

−

D

ij =xj

xi

Λii +xi

xj

Λjj −Λij −Λji , (8)

where xi=Ni/N and Λij can be written in terms of the center of mass velocity

Λij =1

3NZ∞

0

dth

Ni

X

k=1

vi,k(0) ·

Nj

X

l=1

vj,l(t)i. (9)

From the expressions above, the collective character of the Maxwell-Stefan diffusion coefﬁcient is evident. This

leads to signiﬁcantly less data for a given system size and time step and therefore to larger statistical uncertainties

than in case of the self-diffusion coefﬁcient. Note that equations for the Maxwell-Stefan diffusion coefﬁcient are

implemented in ms2both for binary and ternary mixtures [45].

Shear viscosity. The shear viscosity η, as deﬁned by Newton’s ”law” of viscosity, is a measure of the resistance

of a ﬂuid to a shearing force [46]. It is associated with the momentum transport under the inﬂuence of velocity

gradients. Hence, the shear viscosity can be related to the time autocorrelation function of the off-diagonal

elements of the stress tensor Jp[44]

η=1

V kBTZ∞

0

dtJxy

p(t)·Jxy

p(0). (10)

Averaging over all three independent elements of the stress tensor, i.e. Jxy

p,Jxz

pand Jyz

p, improves the statistics.

The component Jxy

pof the microscopic stress tensor Jpis given in terms of the molecular positions and velocities

by [46]

Jxy

p=

N−1

X

i=1

mvx

ivy

i−

N−1

X

i=1

N

X

j=i+1

n

X

k=1

n

X

l=1

rx

ij

∂uij

∂ry

kl

.(11)

Here, the lower indices land kindicate the ninteraction sites of a molecule and the upper indices xand ydenote

the spatial vector components, e.g. for velocity vx

ior site-site distance rx

ij . The ﬁrst term of Eq. (11) is the kinetic

energy contribution and the second term is the potential energy contribution to the shear viscosity. Consequently,

the Green-Kubo integral (10) can be decomposed into three parts, i.e. the solely kinetic energy contribution, the

solely potential energy contribution and the mixed kinetic-potential energy contribution [46].

11

Bulk viscosity. The bulk viscosity ηvrefers to the resistance to dilatation of an inﬁnitesimal volume element

at constant shape [47]. The bulk viscosity can be calculated by integration of the time-autocorrelation function

of the diagonal elements of the stress tensor and an additional term that involves the product of pressure pand

volume V, which does not appear in the shear viscosity, cf. Eq. (10). In the N V E ensemble, the bulk viscosity

is given by [44, 48]

ηv=1

V kBTZ∞

0

dt(Jxx

p(t)−pV (t)) ·(Jxx

p(0) −pV (0)).(12)

The diagonal component Jxx

pof the microscopic stress tensor Jpis deﬁned analogous to Eq. (11). The statistics

of the ensemble average in Eq. (12) can be improved using all three independent diagonal elements of the stress

tensor Jxx

p,Jyy

pand Jzz

p. Eqs. (11) and (12) can directly be applied for mixtures.

4. Simulation program ms2: Detailed description

This section describes ms2in detail and important options for its application are introduced. It is intended to

allow for a better understanding of the simulation program and to facilitate access to the program features.

4.1. Input and output

ms2was designed to be an easily applicable simulation program. Therefore, the input ﬁles are restricted to one

ﬁle for the deﬁnition of simulation scenario and one ﬁle for each of the molecular species that are used in the

simulation. The output ﬁles contain structured information on the simulation. All calculated thermodynamic

properties are summarized in one output ﬁle, which is straight forwardly readable and self-explaining. For

a more detailed evaluation of the simulation, the instantaneous and running averages of the most important

thermodynamic properties are written to other ﬁles. In total, the current status of the simulation and many more

details are written to six ﬁles and can be accessed during execution.

4.2. Reduced quantities

The simulation program ms2internally uses reduced quantities for its calculations. All quantities are reduced by

a reference length σR, a reference energy ǫRand a reference mass mR, respectively. These reference values are

input variables and may, in principle, be chosen arbitrarily. However, it is recommended to use a reference length

σRin the order of 3˚

A, a reference energy ǫR/kBin the order of 100 K and a reference mass mRin the order of

50 atomic units. The reduction scheme for the most important physical quantities is listed in the Supplementary

Material. From these properties, the reduced form of all other quantities can be derived. An exception in the

reduction scheme is the chemical potential, which is normalized in ms2by kBTinstead of ǫR.

4.3. Molecular positions and orientations

The simulation program ms2updates only the positions of the centers of mass and the orientations of the

molecules. From these values, the coordinates of the sites are derived on the ﬂy. The center of mass of each

molecule is stored in Cartesian coordinates. The absolute positions are reduced by the reference length σRand

scaled by the reduced edge length of the cubic simulation volume, leading to position values in the range of

12

−0.5< x, y, z < 0.5. The advantage of this scaling scheme lies mainly in the efﬁcient application of the

periodic boundary condition [36] and the minimum image convention [37].

The molecular orientations are stored using normalized quaternions [29]. Quaternions are a biunique representa-

tion that avoid divergence problems at low angles. Their use allows for an efﬁcient calculation of site positions,

while being less demanding concerning execution time and memory than updating and storing all site positions.

4.4. Initial conﬁguration

It is important to deﬁne a stable and physically reasonable starting conﬁguration for a simulation in a reliable

way. In ms2, the molecules are initially placed on a face-centered cubic lattice in order to avoid overlaps between

molecules. For mixtures, the positions of the different molecule species are distributed randomly on the lattice,

ensuring a homogeneous distribution of all components in the simulation volume.

After the initial molecule placement, the conﬁguration can be relaxed by translational and rotational MC moves.

This step is part of the initialization process and can therefore be performed regardless of the simulation technique

used. The number of relaxation moves is user deﬁned and should be chosen large enough to achieve a physically

reasonable starting conﬁguration. For MD, the molecules are subsequently assigned with initial velocities such

that the temperature is speciﬁed and no net translational and rotational moment is present. It is recommended

to continue with a MD equilibration until a physically reasonable conﬁguration and distribution of velocities

is achieved. The MC loops relax possible overlaps in the initial conﬁguration, whereas the MD equilibration

drives the system into a physically reasonable dynamic microstate. Note that the equilibration process does not

contribute to the calculation of the thermodynamic properties.

4.5. Intermolecular interactions

4.5.1. Dispersive and repulsive interactions

In ms2, the dispersive and repulsive interactions between molecules are reduced to pairwise interactions uLJ

ij of

the different molecule sites iand j, which are modeled by the 12-6-LJ potential [49]

uLJ

ij = 4ε σ

rij 12 −σ

rij 6. (13)

Here, the site-site distance between two interacting LJ sites iand jis denoted by rij , while σand εare the LJ size

and energy parameters, respectively. The LJ potential is widely used and allows for a fast computation of these

basic interactions. It has only two parameters, which facilitates the parameterization of molecular models.

For pure components, the interactions between two different LJ sites are described by the Lorentz-Berthelot

combination rules [50, 51]. For mixtures, the combination rules are extended to the modiﬁed Lorentz-Berthelot

rules, which include two additional parameters ηand ξto describe the interactions between LJ sites of unlike

molecules [52]

σij =ησi+σj

2,(14)

εij =ξ√εiεj. (15)

13

The two parameters scale all LJ interactions between molecules of different components equally. Arbitrary

modiﬁcations of the combination rules are possible. ms2allows the speciﬁcation of ηand ξfor every molecule

pair independently and thus the free parameterization of the combination rules.

4.5.2. Electrostatic interactions

ms2considers electrostatic interactions between electro-neutral molecules. Three different electrostatic site type

models are available: point charge, point dipole and linear point quadrupole. The two higher order electrostatic

interaction sites integrate characteristic arrangements of several single partial charges on a molecule. The simu-

lative advantage of higher order polarities is faster execution and a better description of the electrostatic potential

of the molecule for a given number of electrostatic sites [53]. Combining two partial charges to one point dipole

reduces the computational effort with ms2by about 30%, combining three point charges to one point quadrupole

reduces the computational demand by even 60%. An additional advantage is the simpliﬁcation of the molecular

model, which reduces the number of molecular model parameters. In the following, the implemented electro-

static interactions between sites of the same type are brieﬂy described. The interaction potentials between unlike

electrostatic site types is presented in the Supplementary Material.

Point charge. Point charges are ﬁrst order electrostatic interaction sites. The electrostatic interaction between

two point charges qiand qjis given by Coulomb’s law [29]

uqq

ij (rij , qi, qj) = 1

4πε0

qiqj

rij . (16)

This interaction decays with the inverse of rij and is therefore signiﬁcant up to very large distances. For compu-

tational efﬁciency, the Coulombic contributions to the potential energy are explicitly evaluated up to a speciﬁed

cut-off radius rc. The long range interactions with charges beyond rcare corrected for by the reaction ﬁeld

method. Its application to point charges is discussed below.

Point dipole. A point dipole describes the electrostatic ﬁeld of two point charges with equal magnitude, but

opposite sign at a mutual distance a→0. Its moment µis deﬁned by µ=qa. The electrostatic interaction

between two point dipoles with the moments µiand µjat a distance rij is given by [29, 43]

uDD

ij (rij , θi, θj, φij, µi, µj) = 1

4πǫ0

µiµj

r3

ij sin θisin θjcos φij −2 cos θicos θj, (17)

with θibeing the angle between the dipole direction and the distance vector of the two interacting dipoles and

φij being the azimuthal angle of the two dipole directions, cf. Figure 2. In ms2, the interaction between two

dipoles is explicitly evaluated up to the speciﬁed cut-off radius rc. The long range contributions are considered

in ms2by the reaction ﬁeld method [29].

Linear point quadrupole. A linear point quadrupole describes the electrostatic ﬁeld induced either by two

collinear point dipoles with the same moment, but opposite orientation at a distance a→0, or by at least

three collinear point charges with alternating sign (q,−2q,q). The resulting quadrupole moment Qis deﬁned by

14

Q= 2aq. The electrostatic interaction between two linear point quadrupoles with the moments Qiand Qjat a

distance rij is given by [43, 54]

uQQ

ij (rij , θi, θj, φij, Qi, Qj) = 1

4πε0

3

4

QiQj

r5

ij

1−5(cos θi)2+ (cos θj)2−15 (cos θi)2(cos θj)2+

2sin θisin θjcos φij −4 cos θicos θj2,

(18)

where the angles θi,θjand φij indicate the relative angular orientation of the two point quadrupoles, as discussed

above. Note that no long range correction is necessary for this interaction type if the ﬂuid is isotropic.

Reaction ﬁeld method. The truncation of interactions between ﬁrst and second order electrostatic sites leads to

errors that need to be corrected. The reaction ﬁeld method [34, 55] is implemented in ms2for this task, being

widely used and well accepted [56, 57]. Its advantages are accuracy and stability, while requiring little compu-

tational effort compared to other techniques like Ewald summation [58]. However, it is limited to electro-neutral

systems, thus ms2is currently restricted to electro-neutral molecules. The basic assumption of the reaction ﬁeld

method implemented in ms2is that the system is sufﬁciently large so that tinfoil boundary conditions (εs→ ∞)

are applicable without a loss of accuracy. This is the case for N≥500 [59, 60, 61, 62].

4.6. Cut-off modes

ms2supports two different cut-off modes: the site-site cut-off and the center of mass cut-off. The site-site cut-off

mode explicitly considers the interactions between all sites that are within a distance of rc. Beyond rc, the long-

range contributions to the energy and pressure are estimated by analytical functions assuming a homogeneous

ﬂuid [29]. A disadvantage of the site-site cut-off arises, if molecular models contain point charges. In many cases

close to rc, molecules are only partially considered in the explicit calculation, cf. Figure 3. The point charges

within the cut-off radius may be unbalanced so that an overall charge within the cut-off sphere might occur. For

this condition, the reaction ﬁeld is not valid.

A more robust alternative is the center of mass cut-off mode. It considers all interaction sites of different

molecules explicitly, if their molecular centers of mass are within the cut-off radius. The long-range contri-

butions beyond rcdue to LJ interactions are approximated by the formulations of Lustig [63]. Note that the

computational advantage of the center of mass cut-off mode increases with the number of sites per molecule.

4.7. Monte-Carlo algorithm

Thermodynamic properties are determined by MC simulation via a Markov chain of molecular conﬁgurations.

In ms2, this Markov chain is generated by executing a loop of NMC moves per conﬁguration, where NMC is

deﬁned by

NMC =1

3

N

X

i=1

Ni,DGF . (19)

Here, Nrepresents the number of molecules in the system and Ni,DGF is the number of degrees of freedom

of molecule i. For MC simulations, three different moves are implemented in ms2, translational and rotational

15

displacement of a single molecule and ﬂuctuation of the simulation volume. All three moves are well known and

widely in use [29] and thus not further discussed here.

The acceptance of MC moves is associated with energy differences before and after that perturbation. To speed

up execution, it is avoided in ms2to calculate the old energy for each attempted move. Instead, the pairwise

interactions of all molecules with all the remaining N−1molecules in the system are stored in a N×Nmatrix.

The sum of each column and row of that matrix, respectively, equals the potential energy of one particular

molecule. If a molecule iis assigned to be moved, its old energy is determined by simply summing up all

contributions in column iof the energy matrix. After the move, its new energy is determined by calculating

the pairwise interactions with the remaining molecules and the individual contributions are stored in a vector of

dimensions N×1. If the move is accepted, the vector replaces column ias well as row iof the energy matrix.

If the move is rejected, the vector is discarded. The same technique is used for the virial contributions.

5. Implementation

In this section, the code design and implementation issues of ms2are discussed. These explanations of the source

code are intended to help understanding the program and to encourage the reader to further develop and improve

the code. New developersshould get a smooth entry and the possibility to make fast adaptations to new problems.

Due to the experience of the core developers and the suitability for an efﬁcient numerical code, Fortran90 was

chosen as programming language. The program makes extensive use of the concepts introduced with Fortran90,

notably modular programming. A wide variety of different simulation setups is possible, each requiring different

computations. This variety leads to a need for a highly modular structure, where modules have clear-cut inter-

faces. Given such modularity, code parts that are not needed for the user deﬁned simulationsetup, can be simply

skipped within the calculation. Besides avoiding if-statements at computationally demanding points and there-

fore leading to a better data ﬂow, this allows for an efﬁcient implementation of new functionalities, as existing

modules can be used whenever feasible. In ms2, a stringent philosophy regarding modularization was followed,

which is discussed in section 5.1. The runs need to be fast and scale well with increasing number of processors in

order to reduce response times. ms2is parallelized with the Message Passing Interface (MPI) [64]. While MD

needs synchronization after every time step due to its deterministic nature, MC can be executed embarrassingly

parallel, requiring synchronization only at the very end of a run. Furthermore, the by far most expensivepart of

the code, calculating forces and potential energies, is highly vectorized and optimized.

5.1. Modular structure

Goals.

The simulation program ms2is intended to be ﬂexible, expandable and easy to get familiar with. Still, the

program has to be efﬁcient and effective in its calculations, such that the employed computer resources are

efﬁciently used and therefore, the response times are minimized. The introduction of Abstract Data Types (ADT)

is a popular approach to reduce the complexity of a software system, decomposing it into smaller subunits, which

are easier to maintain. The ADT model, with its data and code abstraction, forms the basis of Object-Oriented

16

Programming (OOP). It should be mentioned that OOP holds challenges regarding efﬁciency. Distributing the

data between objects can lead to data fragmentation and indirections can occur, leading to a negative impact on

the performance. One objective was to introduce OOP concepts, while still maintaining efﬁciency, cf. section 6.

Fortran issues.

Fortran90 modules are a further development of Fortran77 common blocks. A module may not only contain data,

but also deﬁne subroutines and user derived data types. Due to the fact that module data and associated module

subroutines implement singletons from the start, grouping module-deﬁned user derived data types and associated

functions also allows an OOP-like programming style. A small example will demonstrate this. The module

ms simulation, deﬁned in the ﬁle ms simulation.F90, contains the type TSimulation as well as the subroutines

•TSimulation Construct ( this )

•TSimulation Destruct ( this )

•TSimulation RunSteps ( this , StepStart , StepEnd )

among others. The strict naming convention helps avoiding clashes in name space. All the subroutines above

receive a reference to an instance of TSimulation through the ﬁrst parameter, named this, and they serve as

(default) constructor, destructor or ordinary member function, respectively. In the following, constructs like this

will be referred to as a class. ms2does not strictly follow an OOP approach, because all data elements can be

accessed directly, therefore there is no information hiding through data encapsulation.

Class structure.

In order to keep the class hierarchy ﬂat, ms2does not make use of inheritance and hence polymorphism. E.g.

the module ms2 site contains the classes TSiteLJ126, TSiteCharge, TSiteDipole and TSiteQuadrupole, which

have data and functionalities in common, but are not derived from a common class. The class structure and the

relations between the classes are organized in six levels, as depicted in Figure 4. Every class is located on a

certain level and hence has data and modules relevant to its level. The six levels are intuitive:

1. Global – all data and functions of global use are handled. Scope: whole code

2. Simulation – setup and control of the simulation. Scope: simulation framework

3. Ensemble – the required ensemble is set up and initialized. Scope: molecular ensemble

4. Component – data and functions dealing with all molecules of a given molecule type. Scope: one molecule

species

5. Molecule – data regarding the basic structure of a molecule. Scope: one molecule

6. Site – position of sites with respect to a molecule. Scope: one site of one molecule

All subroutines on every level are task speciﬁc and implemented as modules. Depending on the simulation setup,

e.g. MD/MC or ensemble, the appropriate modules are engaged individually during simulation. The modules of

ms2can coarsely be divided into following tasks:

17

•initialization

•simulation

•accumulation

•output

If a task spans more than one level, the corresponding module on the topmost level is called, which in turn calls

modules on lower hierarchical levels and so on.

First level - global. The ﬁrst level contains subroutines and variables that are needed globally in all parts of the

simulation. The variables are mainly natural constants like the Boltzmann constant kB, the Avogadro constant

NAetc. Additionally, variables deﬁning the simulation setup for the given run are stored, e.g. the simulation

technique and the ensemble type. These variables determine the type of calculation and are assigned according

to user speciﬁcations to the levels below in ms2. Subroutines that are stored on this level are also of global

use in ms2. These are basic routines for input into the simulation program and output into ﬁles, as well as

more advanced routines for automated communication between hardware and program via signals. The latter

routines are of particular interest for running ms2on high performance computers, where possible data loss can

be avoided by automated communication between cluster nodes and the program.

Second level - simulation. On the second level, the simulation ﬂow is controlled. The simulation is initiated,

started and sustained. This includes controlling the length of the simulation and its output. User speciﬁcations

concerning the simulation setup are read and the corresponding global variables are assigned accordingly. Fur-

thermore, all simulation quantities and averages are analyzed and written to ﬁle.

Third level - ensemble. On the third level, the simulation is organized according to the speciﬁed ensemble. The

respective ensemble is initialized, by characterizing the ensemble setup and deﬁning e.g. the volume and the

composition of the molecular species in the simulation volume. In addition, the initial positions and veloci-

ties of all molecules are distributed. Furthermore, the routines for the subsequent simulation are assigned on

this level. The time integration or the Markov chain, respectively, is performed and the ensemble averages of

thermodynamic properties are calculated during the simulation run.

Fourth level - component. On this level, the molecule species (component) and their interactions are addressed.

The class structure has three distinct branches. While the ﬁrst branch deals with issues regarding structure and

properties of one component at a time, the second branch deals with interactions between molecules of one or

two components. The third branch contains all routines for the accumulation of data, determining the average

value of the data as well as their statistical uncertainties. The ﬁrst two branches will be discussed in more detail.

The ﬁrst branch deals with the structure of one individual component. An instance of the class component stores

center of mass position, orientation, velocity etc. of every molecule of one species, e.g. H2O. The contained

modules deal with calculations of all molecules of this species, e.g. modules for solving Newton’s equations of

18

motion, kinetic energy calculation, atom to molecule transformation and vice versa. Furthermore, this branch

is responsible for the initialization. It reads the molecular model for every component, calculates positions,

introduces the reduced quantities etc.

The second branch deals with interactions between molecules of the same or different type. Therefore, it contains

all force and potential energy calculations, which are the computationally most expensive parts by far. This

branch is highly vectorized to achieve a fast calculation, in particular if executed in vector-parallel. The modules

in this branch are simulation technique speciﬁc, i.e. MD or MC. In a MD simulation, ﬁrst, the interaction partners

of every molecule are determined, i.e. other molecules or sites within the cut-off radius. Then the interaction

energies and virial contributions as well as the resulting forces and torques are calculated and summed up for

every molecule.

In a MC simulation, the modules are designed to calculate only the energy and the virial. Forces are not evaluated

and the corresponding algorithms are therefore entirely omitted in these modules. For MD, all interactions

of one component-component pair are calculated in one call of the module. For MC, the modules are ﬁner-

grained, calculating all interactions of one component-component pair by determining every molecule-molecule

interaction individually. The corresponding modules are thus called more frequently than in case of MD.

Fifth level - molecule. The basic information on a molecular model is stored on this level, such as mass, moment

of inertia tensor, rotational degrees of freedom etc. Higher level modules access these data via the class molecule.

Sixth level - site. Here, the assembly of a molecule is stored. The data are used by higher level modules to

determine the site positions from the molecular positions and orientations. This allows for the calculation of

site-site interactions. It should be noted that ms2only integrates (MD) or accepts (MC) center of mass positions

and orientations. Site speciﬁc data are calculated for every simulation step on the ﬂy.

5.2. Parallelism

The source code is implemented for an efﬁcient parallelism with distributed memory,using the MPI [64] standard

for communication. The program uses the following MPI calls:

•MPI Init, MPI Finalize, MPI Comm rank, MPI Comm size to set up basic MPI functionality

•MPI Abort to stop the program in case of an error

•MPI Barrier, MPI Wtime and MPI Wtick within the stopwatch class

•MPI Bcast, MPI Reduce, MPI Allreduce for simulation data exchange

Exclusively collective communication is used, an approach suitable for molecular simulations, where the cut-

off radius is in the order of the simulation volume edge length. For MD simulations with ms2, only the force

calculation is parallelized using molecule decomposition according to Plimpton [14]. The interaction matrix

is rearranged such that the number of interaction partners is almost equally distributed in the matrix. This is

achieved by calculating the interactions between molecules 1 and Nas interactions between molecules Nand

19

1, cf. Figure 5. The parallelization scheme then distributes the calculation among NPprocessors with an almost

equal work load, such that every processor executes N/NProws of the interaction matrix. The master process

reduces then all the force components to sum up the resulting molecular forces on each molecule.

The MC code is parallelized by exploiting the stochastic nature of the simulation method. Starting from an

equilibrated state in the simulation volume, the molecular conﬁguration is copied multiple times into different

volumes. Then, each copy runs independently in parallel, using a different random number seed, to calculate the

thermodynamic properties. At the end, the data from all copies are gathered and averaged.

5.3. Vector-based structure

The program ms2speciﬁcally accounts for vector-parallel machines. All the main information needed in the

computationally costly inner loops of the calculation, like the positions of the molecules and their sites, are

stored in vectors that are accessed sequentially. This allows for an efﬁcient use of memory on vector-parallel

machines.

For the evaluation of one conﬁguration in ms2, e.g. the position vectors are loaded once and then distributed

among all processors. Then, additional information like interaction partners of each individual molecule is

computed, tabulated in vectors and communicated. The order of this data follows the order of all other vectors,

e.g. the position vector. The appropriate order allows for a serial access to the data in all vectors in the random

access memory. Therefore, the calculations can be vectorized, increasing the efﬁciency.

6. Benchmarks

A subset of the ms2examples, which come along with the code release, was used for proﬁling and runtime tests

that are presented here. The number of time steps (MD) or loops (MC), respectively, considered was lower than

for a normal production run, since only comparisons were carried out. The MD and MC test case simulates the

equimolar liquid mixture of methanol and ethanol at 298.15 K and 0.1MPa in the N pT ensemble [65]. Methanol

was modeled by two LJ sites and three point charges and ethanol by three LJ sites and three point charges [66].

This test case was chosen, because it is a typical application. Similar results are expected for a wide class of

problems. However, note that the actual run times will differ with varying thermodynamic conditions that are

simulated. Run times increase with higher density of the system, among others.

6.1. Sequential version

Compiler. ms2is distributed as a source code and can be compiled with a wide variety of compilers. However,

the performance is signiﬁcantly inﬂuenced by the compiler and linker as well as the used options. Figure 6 shows

the runtime of the test case on different platforms using different compilers. The binaries were generated by

•GNU gfortran2with ”-fdefault-real-8 -O3”

•Intel ifortran3with ”-r8 -fast”

2http://gcc.gnu.org/fortran/

3http://software.intel.com/en-us/intel- compilers/

20

•PGI pgf954with ”-r8 -fastsse”

•Sun Studio sunf905with ”-r8const -fast”

•Pathscale pathf906with ”-r8 -Ofast”

with options activated to enforce the use of double precision ﬂoating point numbers. The chosen optimization

ﬂags represent a common choice.

The best combination of compiler and platform was the Intel ifortran compiler and the Intel Xeon E5560 ”Ne-

halem” processor7. An Intel ifortran compiled binary of ms2signiﬁcantly outperforms binaries compiled by

GNU gfortran and PGI pgf90, independent on the computing platform.

Proﬁling. The test case was chosen for proﬁling studies running the sequential version of ms2with valgrind

using the callgrind tool8. The binary was built by the Intel ifortran 11.1compiler and optimized for SSSE3 to

work with valgrind 3.5.0. From the resulting estimated CPU cycles, the computational hot spots are presented for

both MD (Figure 7a) and MC (Figure 7b) simulations. The calculation of forces and energies consumes most of

the CPU cycles. For the MD run, 54.33% of the CPU time is spent for charge-charge interactions and 36.09% for

LJ-LJ interactions, which adds up to 90.42%. Further program routines, e.g. determining the interaction partners

(7.21%), add up to a total of 99.70% of the CPU time spent for the entire calculation of potential energies

and forces. In MC calculations, 95.68% of the time is spent for energy calculations, including the routine for

determining the interaction partners (4.28%).

With valgrind set up to use a 32768 Byte, 4-way associative level 1 data cache with 64 Byte cachelines and

prefetch enabled, the cache miss rate for reading (D1r) was below 5%, cf. Table 2. The misses occur mainly in

the force calculation for MD (99.19%) and the energy calculation for MC (99.07%).

The ratio of read and write access rates is larger than six, therefore, level 1 reading cache misses (D1mr) have a

larger impact on the performance than level 1 writing cache misses (D1mw) and level 2 cache misses. Instruction

cache misses are negligible here.

6.2. Vectorization

ms2MD simulations also run efﬁciently on vector machines like the NEC SX architectures. Table 3a shows

proﬁling data running the MD test case on a NEC SX8R using ”ftrace”. The force calculation routines reached

an average vector length of about 170 and 1892 MFLOPS, which is good, especially considering that the test

case is not favorable regarding vectorization. The MC algorithm, however, is not well suited for vectorization, cf.

Table 3b. The NEC SX8 shows comparable results, proportional to the lower processor clock speed, whereas re-

sults for the NEC SX9 are below expectation with respect to the higher hardware peak performance and memory

access speed. For comparison, the same test case was executed on a mainstream Intel Xeon E5440 ”Harpertown”

4http://www.pgroup.com/products/

5http://developers.sun.com/sunstudio/

6http://www.pathscale.com/

7http://www.hlrs.de/systems/platforms/nec-nehalem-cluster/

8http://valgrind.org/info/tools.html#callgrind

21

2.83 GHz system, reading hardware counters with PAPI. The MD version achieved an average of 1461 MFLOPS

and the MC version 1374 MFLOPS on this platform.

6.3. Parallelization

ms2supports parallel systems with distributed memory using the MPI standard for communication. Taking a

look at the MPI routines used to exchange data, excluding the ms2 global and ms2 stopwatch modules, solely

three different MPI calls are employed: MPI Bcast (42x), MPI Reduce (13x) and MPI Allreduce (12x). As a

result of the proﬁling in section 6.1, only the force calculations were parallelized for MD and therefore a master

process has to reduce all force components to sum up the resulting molecular forces. MC calculations use this

kind of parallelization only in the equilibration phase. During production, the phase space is sampled with fully

independent random walks on each processing unit, generating independent Markov chains, which have to be

consolidated only once at the end of the simulation.

The ms2parallelization capabilities were tested on a ”Nehalem” dual-Quadcore-CPU cluster with Inﬁniband,

using the Intel ifortran compiler and OpenMPI V1.4 as well as Intel MPI V4.0. The test case runtime results for

a range of commonly used numbers of processors are shown in Figure 8. Using more than a single node, i.e. 8

processes here, Intel MPI showed a better scalability than OpenMPI. While the MD scaling behavior is good for a

decent number of processors, MC production steps can be characterized as optimally parallel. In contrast to this,

the MC equilibration steps, parallelized using a different strategy, show the expected scaling characteristics of

MC simulations. A closer look at the time spent for communication shows the differences between equilibration

and production for a parallel MC run with 32 processors. After the equilibration, which consumes roughly 23

seconds in this case, no communication takes place, cf. Figure 9a.

For MD, MPI communication is necessary throughout the whole simulation, with communication phases in each

time step. Figure 9b shows the communication needed for a single time step, where the barrier induced by the

ﬁrst collective call results in a waiting time for faster processes and particularly for the last process, which might

be notably faster than the others, cf. Figure 9c. This imbalance of the last process is because the force calculation

for Nmolecules is divided among the Npprocesses, where each processor is assigned with ⌈N/Np⌉molecules,

except the last one to which the rest is assigned. Regarding the MD test case with 1372 molecules in Figure 9,

31 processes each calculate the forces for 43 molecules, while the last process only deals with 39 molecules.

This load imbalance is not a serious issue, since also with a different molecule distribution, where the number of

molecules on each processor only differs by one, there is still a processor with 43 molecules, which dictates the

overall execution time. The fraction of the time spent in MPI routines increased to about 10% for 32 processes

from 5% for 16 processes for the test case. Within the collective communication, process 0 is the master and

therefore exhibits a different communication pattern, cf. Figure 9.

6.4. Memory

The memory demand of ms2is low, e.g. for a simulation of 1372 molecules with ﬁve interaction sites, roughly

200 MB of RAM is required. For parallel execution, the concept of distributed memory is used. Here, the

22

memory scales linearly with the number of processors that are used. For most applications of ms2a total RAM

of 2 GB is sufﬁcient.

6.5. Comparison to other codes

Monte-Carlo. The performance of ms2was evaluated against the simulation program M CC CS T owhee V 6.2.7

[67], which is widely in use and well accepted by the scientiﬁc thermodynamic community. In a ﬁrst step,

the comparison was restricted to a simple N V T simulation of the test case. ms2and M C CC S T ow hee,

both compiled with the Intel ifortran compiler, executed the MC test case sequentially, i.e. with one process

only, on a ”Nehalem”/Inﬁniband cluster with an equal number of MC moves. A sequential run was executed

since M CC CS T owhee is not yet prepared for parallelization. The performance of ms2was faster than the

M CC CS T owhee program for this simulation by a factor of around 20.

An important application of ms2is the determination of VLE data. The present comparison was restricted to

two other programs: M CC CS T owhee [67] and the Errington Gibbs ensemble Monte-Carlo code [68]. Both

codes use the Gibbs ensemble approach for the determination of VLE data, a simulation scenario that is not yet

supported by ms2. Instead, the grand equilibrium method was used in ms2for determining the VLE.

The test case was again a mixture of methanol and ethanol at T= 393 K, using Nl= 1372 molecules for the

liquid phase and Nv= 500 molecules for the gas phase. These numbers were also used as starting values for the

Gibbs ensemble calculations. For M CC CS T ow hee and the Errington Code, three different types of moves

were allowed, translational and rotational displacement, volume exchange and molecule transition between the

two simulation volumes. The moves were chosen with probabilities of 79%, 1% and 20% respectively, which

are typical numbers for simulations including dense liquid phases. In ms2, the number of moves was restricted

to two in the liquid phase, translational and rotational displacements (Nltimes) and one volume ﬂuctuation each

loop. For the gas phase simulation, a move for insertion and deletion of molecules was additionally invoked

twice in each loop. The overall execution time was set to 96 hours execution time, which is about the maximal

time that is currently provided for a single simulation in high performance computing centers. The runs were

performed sequentially, since M CC CS T owhee and the Errington code are not suited for parallel execution.

For this test case, ms2showed the best performance. Within the 96 h, ms2calculated 52000 loops in the liquid

phase simulation, and 50000 loops in the vapor phase simulation. The Errington code performed second best

in this test, running 22000 loops, followed by Towhee, which ran 6000 loops.

The performance of the codes inﬂuenced the quality of the simulation results drastically. For the VLE test case,

the reference phase equilibrium data taken from the literature comprise a vapor pressure of 0.58 MPa, a saturated

liquid density of 17.22 mol/l and a saturated vapor density of 0.19 mol/l. ms2reproduced the experimental

results well in the given time. The calculated vapor pressure of ms2was 0.53 MPa, while the saturated liquid

density of the mixture was predicted to be 17.54 mol/l and the saturated vapor density to be 0.18 mol/l. The sta-

tistical uncertainties for the quantities was acceptably low, being 0.01 MPa for the vapor pressure and 0.03 mol/l

and 0.003 mol/l for the saturated liquid density and saturated vapor density, respectively. The Errington code

calculated in the given time a vapor pressure of 0.19 MPa well as a saturated liquid density of 19.93 mol/l and

23

a vapor density of 0.443 mol/l. Using T ow hee, the calculation yielded a vapor pressure of 1.81 MPa as well

as densities of 13.62 mol/l and 0.64 mol/l, respectively. Here, the standard deviations of the results were sig-

niﬁcantly higher. These simulation results showed a drift, e.g. for the saturated liquid density by an average of

1.50 mol/l over 1000 steps. The results for the Errington Code and M CC CS T owhee indicate that the simu-

lations with these programs had not reached the phase equilibrium in 96 h. This effect is much more pronounced

for M CC CS T owhee, having executed signiﬁcantly less loops than the Errington code.

Molecular dynamics. The performance of the MD part of ms2was evaluated against the simulation program

Gromacs V 4.0.3[69], which is designed for simulations of biological systems. The comparison was restricted

to a simple N V T simulation of the equimolar liquid mixture of ethanol and methanol at T= 298 K and

p= 0.1MPa. The runs were executed sequentially on a ”Harpertown”/Inﬁniband cluster with an equal number

of time steps and the same interaction cut-off radius of 21 ˚

A. Both programs, ms2and Gromacs, were compiled

with the Intel ifortran compiler. Gromacs was faster by almost a factor of two than ms2and also the scaling

behavior for the parallel version was superior. This shows that there is still an optimization potential in ms2,

which will be exploited in the future.

A comparison between Gromacs and ms2for the VLE test case was not carried out, since Gromacs does not

support VLE simulations.

6.6. Computational demand for the calculation of transport properties

The computational demand for transport properties following the Green-Kubo formalism was evaluated. This

investigation was performed for the test case equimolar liquid mixture of methanol and ethanol at a temperature

T= 298 K and p= 0.1MPa with an autocorrelation length of 13.82 ps. The time period between two autocor-

relation functions was speciﬁed to be 197.4 fs. A total of 100 autocorrelation functions was explicitly written to

ﬁle in order to check the results. The transport properties were evaluated every four autocorrelation functions.

The reference case was a MD run performed with the same technical details, except for the calculation of the

autocorrelation functions. Calculating transport properties increased the CPU time of a simulation step by about

78% compared to the reference case. Note that all autocorrelation functions were evaluated each time step, which

is an extreme frequency that can be reduced without much loss of accuracy.

7. Features

The feature programs described in the following are intended to facilitate the handling of ms2. They should

allow for an easy start of molecular simulations with ms2and give access to all output data generated by ms2.

7.1. Simulation setup: ms2par

The GUI-based feature tool ms2par allows for a convenient generation of input (*.par) ﬁles according to user

speciﬁcations, cf. Figure 10. It assigns the molecular model, ensemble, thermodynamic state point, time step,

number of equilibration steps etc. Furthermore, the tool proposes a cut-off radius based on the input data. After

the user completes the speciﬁcations, the program generates the *.par-ﬁle in ASCII format, to be read by ms2.

24

Applying ms2par is simple and should be intuitive even for new users. ms2par is a java application, thus it runs

on all operating systems.

7.2. Simulation analysis: ms2chart

ms2chart is a GUI-based java applet for evaluating the simulation results from ms2, cf. Figure 11. This tool

plots the evolution of properties calculated by ms2. The properties can be shown for different graph axes that

are chosen from a drop down menu and the plot is shown directly in the GUI. For a better analysis, ms2chart

allows various features: plotting block averages as well as simulation averages into the same plot, changing the

design of the plot, individual labeling of the axes and adjusting the frame detail. All plots can be exported in

png format.

The analysis program ms2chart can be executed at any time of the simulation, i.e. also while the simulation

is still running. There is no loss of data, if it is run on the ﬂy. The tool ms2chart is easy to understand and

can be handled intuitively. It allows for an easy ﬁrst analysis of the ms2simulation results. Users will employ

ms2chart for a quick check of their simulation results, such as whether the equilibration time was appropriate,

which is important if extensive series of simulation runs are performed.

7.3. Visualization tool: ms2molecules

The program ms2molecules is a visualization tool for ms2. The program displays the molecular trajectories

that are stored by ms2as a series of conﬁgurations in the *.vim ﬁle. ms2molecules visualizes molecular sites

by colored spheres. The colors are user deﬁned. The size of a sphere is by default proportional to the LJ size

parameter σ, but can be changed manually in the ﬁrst lines of the *.vim ﬁle. This feature facilitates monitoring a

component of interest by reducing the size of other components, e.g. solvents. Other features like zooming into

or out of the simulation volume as well as rotating the simulation volume also facilitate the analysis of simula-

tions. The visualization can be exported via snapshots in the jpg format.

The program is based on OpenGL and written in C. The handling of the program is simple and console based.

Requirements for the feature tool are OpenGL in a Windows or Linux environment. Figure 12 shows a snap-

shot of a ternary mixture, taken with the program ms2molecules, which is convenient to gain insight into the

trajectory of the system or the state of the system, respectively.

8. Conclusions

The molecular simulation program ms2was designed for the calculation of thermodynamic properties of bulk

ﬂuids. Special care was given to a minimization of the response time. The capabilities of ms2are broad, rang-

ing from basic static thermodynamic properties, like thermal and caloric data, over vapor-liquid equilibria to

transport properties, like diffusion coefﬁcient, viscosity and thermal conductivity. These data are accessible for

pure substances and mixtures. The accuracy of the simulation data generated by ms2is high, while consum-

ing a reasonable computational effort. Molecular models of more than 100 pure ﬂuids are supplied with ms2

that accurately describe their thermodynamic properties. Despite the fact that ms2is a sophisticated Fortran90

program, new developers beneﬁt from its modular structure and object-orientation. The application of ms2is

25

straightforward, because the input ﬁles are well structured and auxiliary feature tools help to create input ﬁles,

analyze simulation results and visualize molecular trajectories. The code was optimized for the current hardware

technology and achieves a high efﬁciency.

Ongoing efforts focus on the implementation of new algorithms that extend the applicability of the program

as well as its analysis tools. The two major current developments are the implementation of internal degrees of

freedom to allow for the application of ms2to larger molecules and Ewald summation to allow for the application

of ms2to charged molecules. The source code is available to the scientiﬁc community at http://www.ms-2.de.

9. Acknowledgments

The authors gratefully acknowledge ﬁnancial support by the BMBF ”01H08013A - Innovative HPC-Methoden

und Einsatz f¨ur hochskalierbare Molekulare Simulation” and computational support by the Steinbruch Centre

for Computing under the grant LAMO and the High Performance Computing Center Stuttgart (HLRS) under

the grant MMHBF. The present research was conducted under the auspices of the Boltzmann-Zuse Society of

Computational Molecular Engineering (BZS).

26

Table 1: Computational demand for the calculation of the chemical potential for a three-center LJ ﬂuid using

Widom’s test molecule insertion in MD or MC simulation. The calculations were performed on a single proces-

sor. The simulation time per time step (MD) was 0.043 s and 0.300 s per loop (MC), respectively. The simulations

were performed with 500 molecules at a density of 0.23 mol/l. A cut-off radius of 60 ˚

A was employed.

# test molecules Additional CPU time per MD time step or MC loop / s

1 0.0015

500 0.1925

1000 0.3865

1500 0.5810

2000 0.7750

27

Table 2: ms2data cache behavior for the test case investigated with callgrind (D: data, 1m: level 1 cache misses,

r/w: read/write access). The data misses are normalized by the total amount of data access.

D1mr/Dr D1mw/Dw D2mr/Dr D2mw/Dw Dr/Dw

[%] [%] [%] [%]

MD 2.41 0.34 0.0001 0.0002 6.18

MC 4.98 7.48 0.006 2.04 10.35

28

Table 3: NEC SX8R ”ftrace” proﬁling. The ﬁve routines with the largest computational demand and their

vectorization for the test case are shown for MD (a) and MC (b).

(a)

PROG.UNIT FREQUENCY

EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. VECTOR I-CACHE O-CACHE BANK CONF

TIME[sec] (%) [msec] RATIO V.LEN TIME MISS MISS CPU NETWORK

ms2 potential.tpotcc force 126900

598.058 (53.4) 4.713 8661.0 2164.7 99.53 169.3 579.369 0.012 3.383 3.487 13.433

ms2 potential.tpotl jlj force 89300

373.605 (33.4) 4.184 8039.8 1804.6 99.60 170.0 360.055 0.010 2.775 2.317 8.112

ms2 interaction.tinteraction force 14100

118.367 (10.6) 8.395 1749.6 429.2 80.68 172.6 36.326 0.025 39.243 0.092 0.961

ms2 interaction.tinteraction calcpartners 14100

27.200 (2.4) 1.929 14262.1 3412.8 99.28 191.2 25.795 0.021 0.091 0.169 0.830

ms2 component.tcomponent restartsave 2

0.302 (0.0) 151.159 341.9 7.7 0.14 3.3 0.003 0.048 0.010 0.000 0.001

total 648179

1119.044 (100.0) 1.726 7855.1 1891.6 99.09 170.5 1002.287 0.265 45.647 6.157 23.568

(b)

PROG.UNIT FREQUENCY

EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. VECTOR I-CACHE O-CACHE BANK CONF

TIME[sec] (%) [msec] RATIO V.LEN TIME MISS MISS CPU NETWORK

ms2 interaction.ti nteraction energy 10155544

5055.675 (98.3) 0.498 2388.1 563.9 69.23 162.3 635.244 22.679 827.141 6.305 45.971

ms2 component.tcomponent mol2atom1 5604792

25.066 (0.5) 0.004 305.0 41.3 0.00 0.0 0.000 3.765 7.560 0.001 0.453

ms2 ensemble.tensemble energy1 3430000

13.135 (0.3) 0.004 1271.2 492.5 87.92 237.8 2.329 1.990 4.316 0.001 0.163

ms2 ensemble.tensemble move 1714383

9.818 (0.2) 0.006 144.8 7.1 7.24 3.0 0.395 2.692 2.778 0.000 0.216

ms2 ensemble.tensemble rotate 1715617

8.658 (0.2) 0.005 141.9 9.9 0.00 0.0 0.000 2.491 2.313 0.000 0.199

total 44555359

5142.345 (100.0) 0.115 2366.6 559.1 69.33 162.7 646.942 38.509 849.895 6.401 47.877

29

p1p2plpk-1 pk

Figure 1: Schematic of the gradual insertion of a molecule. The molecule (dark gray) in state πlﬂuctuates

between state l= 0 (fully decoupled) and state l=k(fully coupled).

30

rij

fij

qi

qj

Figure 2: Schematic of the angles between two point dipoles iand jindicated by the arrows, which are situated

on different molecules at a distance rij .

31

Site jCOM

COM

Site i

dri

Figure 3: Schematic of the different cut-off modes. Using the site-site cut-off mode, the interaction between the

two sites marked by a dot is explicitly evaluated, since they are within the cut-off radius indicated by the dotted

line. The distance between the centers of mass (COM), marked by a cross, is irrelevant. Using the center of mass

cut-off mode, none of the site-site interactions of the two molecules are explicitly evaluated, since the centers

of mass are not within the cut-off radius. The distance between the center of mass and one particular site of

molecule iis deﬁned as dri.

32

!" !"

#$ #$

% %

&' &'

( !" !"( !" !"

( (

(% (%

(& (&

(% (%

(%% (%%

(%& (%&

(& (&

(&% (&%

(&& (&&

!

! ! !

! ! !

! !!

Figure 4: UML class diagram of ms2, showing object attributes. The bold headers specify the class name,

while the italic names correspond to the program ﬁles, the class is implemented. The arrows indicate the com-

mand structure, while the bold arrows indicate information transfer. For a better understanding of the struc-

ture, following abbreviations are introduced into the diagram: CC: ChargeCharge; CD: ChargeDipole; CQ:

ChargeQuadrupole; DC: DipoleCharge; DD: DipoleDipole; DQ: DipoleQuadrupole; QC: QuadrupoleCharge;

QD: QuadrupoleDipole; QQ: QuadrupoleQuadrupole

33

Figure 5: Schematic of the molecule decomposition parallelization method by Plimpton [14]. The colored boxes

signify the pair interactions that need to be determined in order to proceed the simulation. The black lines deﬁne

the range of molecules each processor is assigned to calculate their interactions.

34

Figure 6: Runtime for the equimolar liquid mixture of methanol and ethanol at 298.15 K and 0.1MPa in the

NpT ensemble using different compilers and processors. The simulation was executed with 1372 molecules

with MD for 200 N V T equilibration time steps, 500 NpT equilibration time steps and 4000 production time

steps. For MC, 50 NV T equilibration loops, 200 N pT equilibration loops and 1000 production loops.

35

(a)

(b)

Figure 7: CPU cycles for one MD step (a) and one MC loop (b) of the test case estimated by valgrind. All

functions with less than 1% CPU time share are not shown.

36

(a)

(b)

Figure 8: ms2runtime of the test case with MC (a) and MD (b) on a ”Nehalem”/Inﬁniband cluster for Am-

dahl and Gustafson scaling. Full symbols: Intel MPI V4.0, empty symbols: OpenMPI V1.4.1, 2:N=1372,

3:N=2744, ◦:N=5488, ▽:N=21952. The dashed lines indicate perfect strong scaling, while the full lines

indicate the achieved weak scaling.

37

(a)

(b)

(c)

Figure 9: Communication between 32 processes executing the test case. The green color indicates calculations,

the red color indicates communication. Figure (a) shows the summary time line for MC simulation, while the

lower ﬁgures show the summary time line for MD simulation. Figure (b) exhibits the communication for one

entire MD step, while (c) gives a magniﬁed view on the communication pattern at the end of the time step.

38

Figure 10: Snapshot of ms2par generating an input (*.par) ﬁle for a MD simulation of pure liquid ethylene

oxide.

39

Figure 11: Snapshot of the tool ms2chart. The picture shows the analysis of a simulation of liquid cyclohexane.

40

Figure 12: Snapshot of a ternary mixture taken with the visualization tool ms2molecules. The mixture con-

sists of ethane (red), methane (orange), and carbon dioxide (green). The white frame illustrates the size of the

simulation volume.

41

References

[1] http://www.ﬂuidproperties.org, Industrial ﬂuid properties simulation collective, 2010.

[2] Gubbins, K. E. and Moore, J. D., The Journal of Chemical Physics 49 (2010) 3026.

[3] Tan, S. P., Adidharma, H., and Radosz, M., Industrial & Engineering Chemistry Research 47 (2008) 8063.

[4] Kleiner, M., Turnakaka, F., and Sadowski, G., Molecular Therodynamics of Complex Systems 131 (2009)

75.

[5] Klamt, A., The Journal of Physical Chemistry 99 (1995) 2224.

[6] Klamt, A. and Eckert, F., Fluid Phase Equilibria 172 (2000) 43.

[7] Gupta, S. and Olson, J., Industrial & Engineering Chemistry Research 42 (2003) 6359.

[8] Case, F. H. et al., Fluid Phase Equilibria 274 (2008) 2.

[9] Case, F. et al., Fluid Phase Equilibria 217 (2004) 1.

[10] Case, F. et al., Fluid Phase Equilibria 236 (2005) 1.

[11] Case, F. H. et al., Fluid Phase Equilibria 260 (2007) 153.

[12] Case, F. H. et al., Fluid Phase Equilibria 285 (2009) 1.

[13] Lyubartsev, A. P. and Laaksonen, A., Computer Physics Communications 128 (2000) 565.

[14] Plimpton, S., The Journal of Computational Physics 117 (1993) 1.

[15] The Economist March 13th (2010) 68.

[16] Green, M., The Journal of Chemical Physics 22 (1954) 398.

[17] Kubo, R., The Journal of The Physical Society of Japan 12 (1957) 570.

[18] Poncela, A., Rubio, A. M., and Freire, J. J., Molecular Physics 91 (1997) 189.

[19] Kristof, T., Vorholz, J., Liszi, J., Rumpf, B., and Maurer, G., Molecular Physics 97 (1999) 1129.

[20] Ketko, M. H., Rafferty, J., Siepmann, J. I., and Potooff, J. J., Fluid Phase Equilibria 274 (2008) 44.

[21] Bourasseau, E., Haboudou, M., Boutin, A., Fuchs, A. H., and Ungerer, P., The Journal of Chemical Physics

118 (2003) 3020.

[22] Eckl, B., Vrabec, J., and Hasse, H., Chemie Ingenieur Technik 80 (2008) 25.

[23] Eckl, B., Vrabec, J., and Hasse, H., Molecular Physics 106 (2008) 1039.

42

[24] Eckl, B., Vrabec, J., and Hasse, H., Fluid Phase Equilibria 274 (2008) 16.

[25] Huang, Y.-L., Vrabec, J., and Hasse, H., Fluid Phase Equilibria 287 (2009) 62.

[26] Eckl, B., Horsch, M., Vrabec, J., and Hasse, H., High Performance Computing In Science And Engineering

’08 (2009) 119.

[27] Merker, T., Guevara-Carrion, G., Vrabec, J., and Hasse, H., High Performance Computing In Science And

Engineering ’08 (2009) 529.

[28] Vrabec, J., Huang, Y.-L., and Hasse, H., Fluid Phase Equilibria 279 (2009) 120.

[29] Allen, M. and Tildesley, D., Computer Simulation of Liquids, Clarendon Press, Oxford, 1987.

[30] Vrabec, J. and Hasse, H., Molecular Physics 100 (2002) 3375.

[31] Widom, B., The Journal of Chemical Physics 39 (1963) 2808.

[32] Lyubartsev,A. P., Martinovski, A. A., Shevkunov,S. V., and Vorontsov-Velyaminov, P. N., The Journal of

Chemical Physics 96 (1992) 1776.

[33] Nezbeda, I. and Kolafa, J., Molecular Simulation 5(1991) 391.

[34] Frenkel, D. and Smith, B., Understanding Molecular Simulation, Academic Press, Elsevier, San Diego,

1993.

[35] Rappaport, D., The Art of Molecular Dynamics Simulation, Cambridge University Press, Cambridge, 2004.

[36] Born, M. and von Karman, T., Physikalische Zeitschrift 13 (1912) 297.

[37] Metropolis, N., Rosenbluth, A., Rosenbluth, M. N., Teller, A. H., and Teller, E., The Journal of Chemical

Physics 21 (1953) 1087.

[38] Rowley, R. et al., DIPPR Data Compilation of Pure Compound Properties, Design Institute for Physical

Properties, AIChE, 2003.

[39] Flyvberg, H. and Petersen, H., The Journal of Chemical Physics 91 (1989) 461.

[40] Lyubartsev, A., Laaksonen, A., and Vorontsov-Velyaminov, P., Molecular Physics 82 (1994) 455.

[41] Vrabec, J., Kettler, M., and Hasse, H., Chemical Physics Letters 356 (2002) 431.

[42] Shing, K., Gubbins, K., and Lucas, K., Molecular Physics 65 (1988) 1235.

[43] Gray, C. and Gubbins, K., Theory of molecular ﬂuids, Volume 1: Fundamentals, Clarendon Press, Oxford,

1984.

[44] Gubbins, K., Statistical Mechanics, volume 1, The Chemical Society Burlington House, London, 1972.

43

[45] Krishna, R. and van Baten, J. M., Industrial & Engineering Chemistry Research 44 (2005) 6939.

[46] Hoheisel, C., Physics Reports 245 (1994) 111.

[47] Barton, A., The dynamic liquid state, Longman, London, 1974.

[48] Steele, W., Transport Phenomena in ﬂuids, Marcel Dekker, New York, 1969.

[49] Lennard-Jones, J., Proceedings of the Physical Society 43 (1931) 461.

[50] Berthelot, D., Comptes Rendues Hebdomadaires des Seances de l’Academie des Sciences 126 (1898)1703.

[51] Lorentz, H., Annalen der Physik (1881) 127.

[52] Fischer, J., M¨oller, D., Chialvo, A., and Halle, J. M., Fluid Phase Equilibria 48 (1989) 161.

[53] Stone, A. J., Science 321 (2008) 787.

[54] Hirschfelder, J., Curtiss, C., and Bird, R., Molecular Theory of Gases and Liquids, J. Wiley & Sons, 1954.

[55] Nymand, T. and Linse, P., The Journal of Chemical Physics 112 (2000) 6152.

[56] Steinhauser, O., Molecular Physics 45 (1982) 335.

[57] van der Spoel, D., van Maaren, P., and Berendsen, H., The Journal of Chemical Physics 108 (1998) 10220.

[58] Ewald, P., Annalen der Physik 64 (1921) 253.

[59] Lisal, M., Budinsky, R., and Vacek, V., Fluid Phase Equilibria 135 (1997) 193.

[60] Garzon, B., Lago, S., Vega, C., and Rull, L., The Journal of Chemical Physics 102 (1995) 7204.

[61] Lisal, M., William, W., and Nezbeda, I., Fluid Phase Equilibria 181 (2001) 127.

[62] Jedlovszky, P. and Mezei, M., The Journal of Chemical Physics 110 (1999) 2991.

[63] Lustig, R., Molecular Physics 65 (1988) 175.

[64] MPI-Forum, MPI: A Message-Passing Interface Standard, Version 2.2, High Performance Computing

Center Stuttgart (HLRS), 2009.

[65] Guevara-Carrion, G., Nieto-Draghi, C., Vrabec, J., and Hasse, H., The Journal of Physical Chemistry B

112 (2008) 16664.

[66] Schnabel, T., Srivastava, A., Vrabec, J., and Hasse, H., The Journal of Physical Chemistry B 111 (2007)

9871.

[67] Towhee, http://www.towhee.sourceforge.org, 2008.

[68] Errington, J. and Panagiotopoulos, A. Z., http://kea.princeton.edu/jerring/gcmc/index.html.

[69] Hess, B., Kutzner, C., van der Spoel, D., and Lindahl, E., The Journal of Chemical Theory and Computation

4(2008) 435.

44

Supplementary Material to

ms2: A Molecular Simulation Tool for Thermodynamic Properties

Stephan Deubleina, Bernhard Ecklb, J¨urgen Stollb, Sergey V. Lishchukc, Gabriela Guevara-Carriona, Colin W.

Glassd, Thorsten Merkera, Martin Bernreutherd, Hans Hassea, Jadran Vrabece,∗

aLehrstuhl f¨ur Thermodynamik, Universit¨at Kaiserslautern, 67653 Kaiserslautern, Germany

bInstitut f¨ur Technische Thermodynamik und Thermische Verfahrenstechnik, Universit¨at Stuttgart, 70550 Stuttgart, Germany

cDepartment of Mathematics, University of Leicester, Leicester LE1 7RH, United Kingdom

dH¨ochstleistungsrechenzentrum Universit¨at Stuttgart (HLRS), 70550 Stuttgart, Germany

eLehrstuhl f¨ur Thermodynamik und Energietechnik, Universit¨at Paderborn, 33098 Paderborn, Germany

Abstract

This supplementary material includes detailed descriptions of the equations implemented in ms2as well as

detailed information on the input and output ﬁles of the program.

1. Deﬁnitions of the thermodynamic properties accessible in ms2

1.1. Density, pressure, internal enegy and enthalpy

At constant temperature Tand volume V, the pressure pis determined by [1]

p=kBT

V+W

V+ ∆pL=kBT

V+1

3V

N−1

X

i=1

N

X

j=i+1

rij <rc

rij fij + ∆pL, (1)

where kBis the Boltzmann constant and the brackets h...iindicate the ensemble average. Wdenotes the virial,

which is deﬁned by the force vector fij acting between two molecules at a separation vector rij between their

centers of mass. The contribution ∆pLconsiders the long-range interactions with molecules beyond the cut-off

radius rc.

In the isothermal-isobaric ensemble, the volume is a ﬂuctuating parameter that on average yields the volume

corresponding to the speciﬁed pair of temperature Tand pressure p0. In MD simulations with Andersen’s

barostat [2], which is implemented in ms2, the ﬂuctuations of the volume are damped by a ﬁctive piston mass

Qpaccording to the equation of motion

¨

V=p−p0

Qp. (2)

In MC simulations, the volume is changed randomly and the new volume accepted by applying the Metropolis

acceptance criterion

Pacc = min(1,exp ∆E

kBT), (3)

∗Corresponding author: Jadran Vrabec, Warburger Str. 100, 33098 Paderborn, Germany, Tel.: +49-5251/60-2421, Fax: +49-5251/60-

3522, Email: jadran.vrabec@upb.de

Preprint submitted to Elsevier March 10, 2011

where Pacc is the probability of accepting the volume change and ∆Eis the difference between the old state with

energy Uold for the volume Vold and the new state with energy Unew for the volume Vnew according to

∆E=p0(Vnew −Vold) + (Unew −Uold) + NkBTln Vold

Vnew . (4)

The residual enthalpy Hres is directly linked to the residual internal energy, pressure and volume of the system

Hres =Ures +pV −N kBT, (5)

where Ures is the residual potential energy. It is deﬁned as the sum of all pairwise interaction energies uij and

the appropriate long range correction ∆UL, cf. section 2.4 of this Supplementary Material

Ures =

N−1

X

i=1

N

X

j=i+1

rij <rc

uij + ∆UL. (6)

1.2. Second derivatives

The accessible second derivatives vary with the employed ensemble. In the N V T ensemble, the residual iso-

choric heat capacity cres

vis determined by ﬂuctuations of the residual potential energy Ures

cres

v=1

N∂U res

∂T v=1

kB(N T )2(hUres2i − hUresi2). (7)

The partial derivative of the potential energy with respect to the volume at constant temperature ∂Ures /∂V T

is determined by ﬂuctuations of the residual potential energy Ures and the virial W

∂U res

∂V T=N

3V1

kBT(hWihUresi − 1

NhW U resi) + hWi. (8)

In the N pT ensemble, the second derivatives of the Gibbs energy,namely the residual isobaric heat capacity cres

p,

the isothermal compressibility βTand the volume expansivity αpare functions of ensemble ﬂuctuations [1]. The

residual isobaric heat capacity cres

pis related to ﬂuctuations of the residual enthalpy Hres

cres

p=1

N∂H res

∂T p=1

kB(N T )2h(Hres)2i − hHresi2. (9)

To obtain the total isobaric heat capacity cp, the solely temperature dependent ideal gas contribution cid

p(T)has

to be added. This ideal property is accessible e.g. by quantum chemical methods and can often be found in data

bases [3].

An analogous relationship links the isothermal compressibility βTto volume ﬂuctuations in the NpT ensemble

βT=−1

V∂V

∂p T=1

kBThVihV2i − hVi2. (10)

2

The partial derivative of the residual enthalpy with respect to the pressure at constant temperature ∂H res/∂pT

is linked to ﬂuctuations of volume and residual internal energy

∂H res

∂p T=V−N

ThUresVi − hUres ihVi+p(hV2i − hVi2), (11)

and the volume expansivity αpagain to volume and residual enthalpy ﬂuctuations

αp=1

V∂V

∂T p=N

T2hVihHresVi − hHres ihVi. (12)

Note that Eqs. (7) to (12) are valid for mixtures as well.

1.3. Speed of sound

The speed of sound cis deﬁned by the isothermal compressibility βT, the volume expansivity αp, the isobaric

heat capacity cpand the temperature Tby

c= ( 1

M(βTρ−T α2

p/cp))0.5, (13)

where Mis the molar mass. In ms2, the speed of sound is calculated both for pure components and mixtures in

the NpT ensemble.

1.4. Chemical potential - Widom test molecule insertion

For the calculation of the chemical potential of component iaccording to Widom [4], a so-called ”test” molecule

lof component iis inserted into the simulation volume at a random position with a random orientation. At

constant temperature and pressure its potential energy ψl, due to its interactions with all other ”real” molecules,

is related to the chemical potential according to

µi−µid

i(T) = −kBTln hVexp −ψl/kBTi

hNii. (14)

The test molecule is removed immediately after the calculation of its potential energy, thus it does not inﬂuence

the time evolution of the system or the Markov chain, respectively.

The value of ψlis highly dependent on the random position of the test molecule. In addition, the density of the

system has a signiﬁcant inﬂuence on the accuracy of the calculation. For very dense ﬂuids, test molecules almost

always overlap with some real molecules, which leads to a potential energy ψl→ ∞ and thus to no contribution

to Eq. (14) resulting in poor statistics for the chemical potential or even complete failure of the sampling. Within

limits, lower statistical uncertainties of the chemical potential can be achieved by inserting a large number of test

molecules into the simulation volume, which leads to an increasing computational demand.

3

1.5. Self-diffusion coefﬁcients

The self-diffusion coefﬁcient Diis related to the mass ﬂux of single molecules within a ﬂuid. Therefore, the

relevant Green-Kubo expression is based on the individual molecule velocity autocorrelation function [5]

Di=1

3NiZ∞

0

dtvk(t)·vk(0),(15)

where vk(t)is the center of mass velocity vector of molecule kat some time t. Eq. (15) is an average over all

Nimolecules of component iin the ensemble, since all contribute to the self-diffusion coefﬁcient.

4

2. Details of ms2

This section provides a closer insight into ms2and the implemented equations.

2.1. Reduced quantities

The simulation program ms2internally uses reduced quantities for its calculations. All quantities are reduced by

a reference length σR, a reference energy ǫRand a reference mass mR, respectively.

Table 1: Important physical quantities in their reduced form. Note that ε0indicates the permittivity of the vaccum

ε0= 8.854187×10−12 A2s4kg−1m−3.

length l∗=l

σR

energy u∗=u

ǫR

mass m∗=m

mR

time t∗=t

σRrǫR

mR

piston mass Q∗

p=QpσR4

mR

temperature T∗=T kB

ǫR

pressure p∗=pσ3

R

ǫR

density ρ∗=ρσ3

RNA

volume V∗=V

σ3

R

chemical potential ˜µ=µ

kBT

point charge q∗=1

√4πε0

q

√ǫRσR

dipole moment µ∗=1

√4πε0

µ

pǫRσ3

R

quadrupole moment Q∗=1

√4πε0

Q

pǫRσ5

R

diffusion coefﬁcient D∗=D

σpmR/εR

viscosity η∗=ησ2

R

√mRεR

thermal conductivity λ∗=λσ2

R

kBpmR/εR

5

2.2. Intermolecular interactions

Point dipole interactions. The interaction between a point dipole with moment µiand a point charge qjat a

distance rij is given by [6, 7]

uDq

ij (rij , θi, µi, qj) = −1

4πε0

µiqj

r2

ij

cos θi. (16)

Here, θiis the angle between the distance vector of the point charge and the orientation vector of the point dipole,

as illustrated in Figure 2 in the associated paper.

Point quadrupole interactions. The interaction potential of a linear point quadrupole Qiwith a point charge qj

at a distance rij is given by [1, 8]

uQq

ij (rij , θi, Qi, qj) = 1

4πε0

Qiqj

4r3

ij 3 cos2θi−1,(17)

where θiis the angle between the distance vector of the point charge and the orientation vector of the point

quadrupole, as illustrated in Figure 2 in the associated paper.

The interaction between a linear point quadrupole with moment Qiand a point dipole with moment µjis given

by [6, 8]

uQµ

ij (rij , θi, θj, φij, Qi, µj) = 1

4πǫ0

3

2

Qiµj

r4

ij cos θi−cos θj

1 + 3 cos θicos θj−2 cos φij sin θisin θj,

(18)

where the angles θi,θjand φij indicate the relative angular orientation of the point dipole jand the point

quadrupole i, as shown in Figure 2 in the associated paper.

2.3. Reaction ﬁeld method

The truncation of electrostatic interactions of ﬁrst and second order are corrected for with the reaction ﬁeld

method [9, 10]. Here, all dipoles within the cut-off radius rcpolarize the ﬂuid surrounding the cut-off sphere,

which is modeled as a dielectric continuum with a relative permittivity εs. The polarization gives rise to a

homogeneous electric ﬁeld within the cut-off radius, called the reaction ﬁeld ERF

iof magnitude

ERF

i=1

4πε0

2εs−1

2εs+ 1

1

r3

c

m

X

b=1

Nb

X

j=1

rij <rc

µb,j . (19)

Note that all mdipoles with moment µon all Nbmolecules within the cut-off sphere have to be summed up.

The reaction ﬁeld acting outside of the cut-off sphere interacts with a dipole µiat the center of the cut-off sphere

by [1, 11]

uµRF

i=−1

2µiERF

i, (20)

6

where uµRF

iis its contribution to the potential energy. In Eq. (20), it is assumed that the system is sufﬁciently

large so that tinfoil boundary conditions (εs→ ∞) are applicable without a loss of accuracy, e.g. N≥500 [12,

13, 14, 15].

The reaction ﬁeld method can also be applied to correct for sets of point charges, as long as they add up to a total

charge of zero. Therefore, a point charge distribution is reduced to a dipole vector µqaccording to

µq=

n

X

i=1

riqi, (21)

where ndenotes the total number of point charges and rithe position vector of charge qi. The resulting dipole

µqis transferred into the correction term according to Eqs. (19) and (20).

2.4. Cut-off mode

In the site-site cut-off mode, the LJ contributions of molecules beyond the cut-off radius rcto the internal energy

are estimated by [1]

∆UL∗=8

9Nρ∗π 1

r∗

c9−31

r∗

c3. (22)

The contributions to the pressure are considered by [1]

∆pL∗=32

9(ρ∗)2π 1

r∗

c9−3

21

r∗

c3. (23)

Using the center of mass cut-off mode, the LJ contributions of molecules beyond the cut-off radius rcare esti-

mated on the basis of the correction terms of Lustig [16]. The correction ∆uL∗for the residual internal energy

can be divided into three contributions, the contributions by interactions between two molecule centers (TICC),

between one molecule center and one site (TICS), and between two molecule sites (TISS). The ﬁrst contributions

describe the long range contributions between molecular interaction sites that are positioned in the center of the

mass of the individual molecules. The TICS terms describe the contributions, where one site is located in the

center of mass of its molecule, while the other site is not. The TISS terms correct for contributions between

molecular sites that are not positioned in the center of mass of their molecules. Using these formulations, the

correction term for the residual internal energy is deﬁned by

∆uL∗= 2πρ

NC

X

i=1

NC

X

j=1

NLJ,i

X

α=1

NLJ,j

X

β=1

xixj4ε

TICCu(−6) −TICCu(−3)

TICSu(−6) −TICSu(−3)

TISSu(−6) −TISSu(−3).

(24)

Here, NCis the number of components in the system and NLJ,i the number of LJ sites of the molecule of

component i. The TIXXu functions are calculated by the following equations, where the argument in the brackets

deﬁnes the exponent n

TICCu(n) = r2n+3

c

σ2n(2n+ 3) , (25)

7

TICSu(n) = (rc+τ)2n+3 −(rc−τ)2n+3

4σ2nτ(n+ 1)(2n+ 3) rc+(rc+τ)2n+4 −(rc−τ)2n+4

4σ2nτ(n+ 1)(2n+ 3)(2n+ 4) , (26)

TISSu(n) = −(rc+τ+)2n+4 −(rc+τ−)2n+4 −(rc−τ−)2n+4 + (rc−τ+)2n+4

8σ2nτ1τ2(n+ 1)(2n+ 3)(2n+ 4) rc+

(rc+τ+)2n+5 −(rc+τ−)2n+5 −(rc−τ−)2n+5 + (rc−τ+)2n+5

8σ2nτ1τ2(n+ 1)(2n+ 3)(2n+ 4)(2n+ 5) .(27)

The terms r+and r+are deﬁned by

τ+=τ1+τ2, (28)

τ+=τ1−τ2, (29)

where τ1and τ2deﬁne the distance between sites 1and 2, respectively, to the center of mass of the molecules

they belong to.

The correction of the pressure for the center of mass cut-off is given by

∆pL∗=−2

3πρ2

NC

X

i=1

NC

X

j=1

NLJ,i

X

α=1

NLJ,j

X

β=1

xixj4ε

TICCp(−6) −TICCp(−3)

TICSp(−6) −TICSp(−3)

TISSp(−6) −TISSp(−3).

(30)

These expressions follow the same naming scheme as for the internal energy correction. The functions TICCp,

TICSp and TISSp are deﬁned by

TICCp(n) = 2n·TICCu(n), (31)

TICSp(n) = −(rc+τ)2n+2 −(rc−τ)2n+2

4σ2nτ(n+ 1) r2

c−3·TICSu(n), (32)

TISSp(n) = −(rc+τ+)2n+3 −(rc+τ−)2n+3 −(rc−τ−)2n+3 + (rc−τ+)2n+3

8σ2nτ1τ2(n+ 1)(2n+ 3) r2

c−3·TISSu(n). (33)

8

3. Modular structure - example

The modular structure of ms2is discussed by highlighting a MD simulation calculation process of a pure ﬂuid

modeled by two LJ sites and two point charges in the NV T ensemble. During the initialization process, the

modules for memory allocation and deﬁnition of the simulation scenario are executed. This process is not unique

for any of the hierarchical levels, but takes places on all levels, cf. Figure 4 in the associated paper.

On level one, the simulation is delimited to ”molecular dynamics”, therefore all modules concerning MC sim-

ulation are entirely neglected. On level two, the ”initialization” modules deﬁne the simulation environment by

setting ensemble speciﬁc properties, like simulation volume, temperature as well as the thermostat and integration

algorithm etc. While the use of the ﬁrst modules mentioned above is required in each simulation, independently

of MD or MC, e.g. the trajectory calculation is speciﬁc for MD. However, the modular structure retains the ef-

ﬁciency of the program by initiating only the molecule propagation that is speciﬁc for the simulation technique.

The same effect of modules can be found in the initiation process on lower levels. Here, the composition of

the molecular species in the simulation volume (module component), the description of the molecule (module

molecule) and the storage of the positions, velocities and forces of all LJ sites and charges (module sites) are

determined. All other modules, e.g. treating additional components for mixtures or containing characteristic

properties of dipoles or quadrupoles, are fully omitted by ms2in this case, because they are not needed in this

simulation. Once the simulation has been initialized, a set of modules is invoked, which deals with maintaining

the simulation run. On level one, this includes modules of global use for any simulation, e.g. limiting the exe-

cution time etc. On level two, Newton’s equations of motion are numerically integrated, requiring information

about the ﬂuid’s composition, the positions and orientations of the molecules etc. provided by modules of the

respective level. The thermostat is applied and the calculation of thermodynamic properties is performed. The

forces, torques and energies are calculated in the second branch on the level component. For a MD simulation,

this covers routines for the calculation of energies as well as forces at the same time, whereas for a MC sim-

ulation, the modules only calculate energies and virial contributions. A further set of modules deals with the

accumulation of data. These modules are designed to store data and evaluate the statistics of the produced data.

The actual thermodynamic properties are calculated on the ensemble level. The last set of modules deals with the

output, where the results of the molecular simulation are written to ﬁle. These modules are called independently

on the user settings in all simulation scenarios.

9

4. Input and output

ms2was designed to be an easily applicable simulation program. The structure of the input ﬁles as well as the

output ﬁles is shown in Figure 1.

ms2

*.rav

*.rav

*.run

*.res

Input

Simulation

*.log

Output

*.rtr

*.rst

*.vim

*.par

*.pm

*.nrm

Figure 1: File structure needed and generated by ms2.

4.1. Input ﬁle *.par

The simulation program ms2requires one input ﬁle (*.par) to specify the simulation parameters and one molec-

ular model ﬁle (*.pm) for every molecular species considered. The *.par ﬁle contains all input variables for the

simulation process, such as simulation type, ensemble, number of equilibration and production steps, time step

length etc. Furthermore, the user has to specify temperature, density and ﬂuid composition. Table 2 lists all input

parameters and options to be speciﬁed in the *.par ﬁle. An example for a complete *.par ﬁle is given below for

a MC simulation of pure ethylene oxide in the N pT ensemble, where the chemical potential is calculated by

gradual insertion.

Table 2: Parameters and options speciﬁed in the *.par ﬁle.

Parameter Option Explanation Recommended value

Units SI Physical properties in the *.par ﬁle are given in SI units SI

reduced Physical properties in the *.par ﬁle are given in reduced

units with respect to the reference values of length σR,

energy ǫRand mass mR

LengthUnit Reference length σRin ˚

A 3.0

EnergyUnit Reference energy ǫR/kBin K 100.0

MassUnit Reference mass mRin atomic units u= 1.6605 ×

10−27 kg

100.0

Simulation MD Molecular dynamics simulation

MC Monte-Carlo simulation

Integrator Gear Gear predictor-corrector integrator (MD only) Gear

Leapfrog Leapfrog integrator (MD only)

10

Table 2 continued

Parameter Option Explanation Recommended value

TimeStep Time step of one MD step in fs (MD only) ∼1

Acceptance Acceptance rate for MC moves (MC only) 0.5

Ensemble NVT Canonical ensemble

NVE Micro-canonical ensemble

NPT Isobaric-isothermal ensemble

GE Grand equilibrium method (pseudo-µV T )

MCORSteps MC relaxation loops for pre-equilibration 100

NVTSteps Number of equilibration time steps (MD) or loops (MC)

in the NV T ensemble

20000

NPTSteps Number of equilibration time steps (MD) or loops (MC)

in the NpT ensemble (optional)

50000

RunSteps Number of production time steps (MD) or loops (MC) 300000

ResultFreq Size of block averages in time steps or loops 100

ErrorFreq Frequency of writing the *.res ﬁle in time steps or loops 5000

VisualFreq Frequency of saving conﬁgurations in the *.vim ﬁle for

visualization in time steps (MD) or loops (MC)

5000

CutoffMode COM Center of mass cut-off COM

Site Site-site cut-off

NEnsemble Number of ensembles in the simulation 1

CorrfunMode yes Calculation of autocorrelation functions enabled

no Calculation of autocorrelation functions disabled

Temperature Speciﬁed temperature

Pressure Speciﬁed pressure

Density Speciﬁed density

PistonMass Piston mass for simulations at constant pressure

NParticles Total number of molecules 864 - 4000

Liqdensity Simulation result density

VarLiqDensity Statistical uncertainty of density

LiqEnthaly Simulation result residual enthalpy

VarLiqEnthaly Statistical uncertainty of residual enthalpy

LiqBetaT Simulation result isothermal compressibility

VarLiqBetaT Statistical uncertainty of isothermal compressibility

LiqdHdP Simulation result (dhres /dp)T

VardHdP Statistical uncertainty of (dhres/dp)T

NComponents Number of components in the simulation

11

Table 2 continued

Parameter Option Explanation Recommended value

Corrlength Length of the autocorrelation in time steps

SpanCorrFun Time steps separating subsequent autocorrelation func-

tions

ViewCorrFun Output frequency of the full autocorrelation functions

into the *.rtr ﬁle

ResultFreqCF Output frequency of transport properties into the

*.res ﬁle

PotModel Potential model *.pm ﬁle of a component

MolarFract Molar fraction of a component

ChemPotMethod none No calculation of the chemical potential for this compo-

nent

none

Widom Calculation of the chemical potential for this component

using Widoms’s test molecule method

GradIns Calculation of the chemical potential for this component

using gradual insertion

NTest Number of test molecules for Widom’s test molecule

method

2000

WeightFactors Guess For gradual insertion: use user deﬁned initial values for

the weight factors with optimization of these factors dur-

ing simulation

Guess

OptSet For gradual insertion: user deﬁned values for the weight

factors without adjustment during simulation

Cutoff Cut-off radius for center of mass cut-off

CutoffLJ Cut-off radius for LJ interactions (site-site cut-off)

CutoffDD Cut-off radius for dipole-dipole interactions (site-site

cut-off)

CutoffDQ Cut-off radius for dipole-quadrupole interactions (site-

site cut-off)

CutoffQQ Cut-off radius for quadrupole-quadrupole interactions

(site-site cut-off)

Epsilon Dielectric constant 1.0E+10

12

An example for a *.par ﬁle is given in Table 3. The scenario is a MC simulation in the NpT ensemble for

ethylene oxide.

Table 3: Parameters and options speciﬁed in the *.par ﬁle.

Sim EOX.par

Units = SI

LengthUnit = 3.0

EnergyUnit = 100.0

MassUnit = 100.0

Simulation = MC

Acceptance = 0.5

Ensemble = NVT

MCORSteps = 100

NVTSteps = 2000

NPTSteps = 10000

RunSteps = 50000

ResultFreq = 100

ErrorsFreq = 2000

VisualFreq = 10

CutoffMode = COM

NEnsembles = 1

Temperature = 400.0

Pressure = 0.79355

Density = 21.09227

PistonMass = 0.00001

NParticles = 500

NComponents = 1

PotModel = eox.pm

MolarFract = 1.0

ChemPotMethod = GradIns

WeightFactors = Guess

1.00

2.00

3.00

4.00

5.00

10.00

20.00

40.00

60.00

Cutoff = 5.0

Epsilon = 1.E+10

13

4.2. Input ﬁle *.pm

A *.pm ﬁle contains the molecular model for a given substance. It contains the relative positions and the parame-

ters of all sites. The potential model ﬁle for methanol is shown in Table 4. Methanol was modeled by two LJ sites

and three point charges [17]. All positions and distances in the *.pm ﬁle are given in ˚

A, the LJ parameters σ

and ε/kBare given in ˚

A and K, respectively, while the mass is given in atomic units (u= 1.6605 ×10−27 kg).

The magnitudes of the charges are speciﬁed in electronic charges (e = 1.602 ×10−19 C), while the dipole

moments and quadrupole moments are given in Debye (D = 3.33564 ×10−30 Cm) and Buckingham (B =

3.33564 ×10−40 Cm2), respectively. The orientations of the dipole and quadrupole are represented by spherical

coordinates, where the azimuthal angle φspeciﬁes the angle to the positive x-axis and the polar angle θdeﬁnes

the angle to the positive z-axis. Both angles are speciﬁed in degrees. Molecular models can be orientated arbi-

trarily in the *.pm ﬁle. All site positions are transformed into a principal axes coordinate system at the beginning

of each simulation with ms2. The normalized site positions are written to a *.nrm ﬁle for each component.

14

Table 4: Parameters and options speciﬁed in the *.pm ﬁle for a molecular model of methanol.

MeOH.pm

NSiteTypes = 2

SiteType = LJ126

NSites = 2

x = 7.660331E-01

y = 1.338147E-02

z = 0.0

sigma = 3.754348

epsilon = 120.591759

mass = 15.034

x = -6.564695E-01

y = -6.389332E-02

z = 0.0

sigma = 3.030

epsilon = 87.879094

mass = 16.00

SiteType = Charge

NSites = 3

x = 7.660331E-01

y = 1.338147E-02

z = 0.0

charge = 0.247461

mass = 0.0

shielding = 0.1

x = -6.564695E-01

y = -6.389332E-02

z = 0.0

charge = -0.678742

mass = 0.0

shielding = 0.1

x = -1.004989E+00

y = 8.145993E-01

z = 0.0

charge = 0.431281

mass = 1.008

shielding = 0.05

NRotAxes = auto

15

4.3. Output ﬁles

ms2yields seven output ﬁles:

•*.log ﬁle - stores a complete summary of all execution steps taken by ms2.

•*.res ﬁle - contains the results of the simulation in an aggregated form. The data is written to ﬁle in

reduced quantities as well as in SI units, along with the statistical uncertainties of the calculated properties.

The *.res ﬁle is created during simulation and updated every speciﬁed number of time steps or loops.

•*.run ﬁle - contains the calculated properties of the simulation for a speciﬁed time step or loop interval.

The ﬁle is in tabular form, where the data is given in reduced units. The ﬁle is subsequently updated

according to the user speciﬁcation, which is set in the *.par ﬁle.

•*.rav ﬁle - contains the block averages of the calculated properties. The ﬁle is in tabular form, where the

data is given in reduced units. The ﬁle is subsequently updated according to the user speciﬁcation, which

is set in the *.par ﬁle.

•*.rtr ﬁle - stores the ﬁnal values of the autocorrelation functions and their integrals. The number of output

lines has to be deﬁned in the *.par ﬁle.

•*.rst ﬁle - is the restart ﬁle of the simulation. It contains all molecular positions, velocities, orientations,

forces, torques and block averages for the thermodynamic properties. It is written once at the end of a

simulation or immediately after having received a termination signal of the operating system. The *.rst ﬁle

allows for a stepwise execution of the simulation, necessary e.g. in case of an early interruption of the

simulation, time limits on a queuing system or unexpected halts.

•*.vim ﬁle - is the trajectory visualization ﬁle. It contains the positions and orientations of all molecules in

an aggregated ASCII format. The conﬁgurations are written to ﬁle after a user-speciﬁed interval of time

steps or loops. The *.vim ﬁle is readable by the visualization tool ms2molecules, which is also part of

the simulation package.

•*.nrm ﬁle - stores the normalized coordinates of a potential model after a principal axestransformation.

16

References

[1] Allen, M. and Tildesley, D., Computer Simulation of Liquids, Clarendon Press, Oxford, 1987.

[2] Andersen, H., The Journal of Chemical Physics 72 (1980) 2384.

[3] Rowley, R. et al., DIPPR Data Compilation of Pure Compound Properties, Design Institute for Physical

Properties, AIChE, 2003.

[4] Widom, B., The Journal of Chemical Physics 39 (1963) 2808.

[5] Gubbins, K., Statistical Mechanics, volume 1, The Chemical Society Burlington House, London, 1972.

[6] Hirschfelder, J., Curtiss, C., and Bird, R., Molecular Theory of Gases and Liquids, J. Wiley & Sons, 1954.

[7] Lorrain, P., Corson, D., and Lorrain, F., Elektromagnetische Felder und Wellen, Walter de Gruyter, Berlin,

1995.

[8] Gray, C. and Gubbins, K., Theory of molecular ﬂuids, Volume 1: Fundamentals, Clarendon Press, Oxford,

1984.

[9] Frenkel, D. and Smith, B., Understanding Molecular Simulation, Academic Press, Elsevier, San Diego,

1993.

[10] Nymand, T. and Linse, P., The Journal of Chemical Physics 112 (2000) 6152.

[11] Boettcher, C., van Belle, O., Bordewijk, P., and Rip, A., Theory of electric polarization, Vol. 1: Dielectrics

in static ﬁelds, Elsevier, 1973.

[12] Lisal, M., Budinsky, R., and Vacek, V., Fluid Phase Equilibria 135 (1997) 193.

[13] Garzon, B., Lago, S., Vega, C., and Rull, L., The Journal of Chemical Physics 102 (1995) 7204.

[14] Lisal, M., William, W., and Nezbeda, I., Fluid Phase Equilibria 181 (2001) 127.

[15] Jedlovszky, P. and Mezei, M., The Journal of Chemical Physics 110 (1999) 2991.

[16] Lustig, R., Molecular Physics 65 (1988) 175.

[17] Schnabel, T., Srivastava, A., Vrabec, J., and Hasse, H., The Journal of Physical Chemistry B 111 (2007)

9871.

17