ArticlePDF Available

Abstract

Parallel computations speed up simulation of multibody system dynamics, in particular, dynamics of railway vehicles and trains. It is important for reduction of required time at the stage of new railway vehicle design, for increase of complexity of studied problems and for real-time applications. We consider realization of parallel computations in Universal Mechanism software in three different areas: simulation of rail vehicle and train dynamics, evaluation of wheel profile wear and multi-variant computations. The use of clusters for parallel running of multi-variant computations is illustrated. Co-simulation based on the interface between Universal Mechanism and Matlab/Simulink and other software tools is discussed.
TRANSPORT PROBLEMS 2019 Volume 14 Issue 3
PROBLEMY TRANSPORTU DOI: 10.20858/tp.2019.14.3.15
Keywords: multibody dynamics; parallel computations; co-simulation; railway research
Dmitry POGORELOV, Alexander RODIKOV, Roman KOVALEV*
Bryansk State Technical University, Laboratory of Computational Mechanics
bulv. 50 let Oktyabrya 7, Bryansk, 241035, Russia
*Corresponding author. E-mail: kovalev@umlab.ru
PARALLEL COMPUTATIONS AND CO-SIMULATION IN UNIVERSAL
MECHANISM SOFTWARE. PART I: ALGORITHMS AND
IMPLEMENTATION
Summary. Parallel computations speed up simulation of multibody system dynamics,
in particular, dynamics of railway vehicles and trains. It is important for reduction of
required time at the stage of new railway vehicle design, for increase of complexity of
studied problems and for real-time applications. We consider realization of parallel
computations in Universal Mechanism software in three different areas: simulation of rail
vehicle and train dynamics, evaluation of wheel profile wear and multi-variant
computations. The use of clusters for parallel running of multi-variant computations is
illustrated. Co-simulation based on the interface between Universal Mechanism and
Matlab/Simulink and other software tools is discussed.
1. INTRODUCTION
In this paper, we consider three areas for parallel computations:
- simulation of rail vehicle and train dynamics,
- evolution of railway wheel profile due to wear and
- multi-variant computations.
In the first two cases, the parallelism is realized on multi-core processors with the help of the
multithread technique. The third one is based on the multiprocess technique that might use not only
the local computer but also the network computational resources.
Parallel computation is a well-known tool for speeding up the analysis of different scientific and
engineering tasks. Compared with general multibody system (MBS) analysis, rail vehicle research has
several specific features that should be taken into account [1]. For instance, dynamic models along
with vehicles often include flexible rails and bridges, which increase the number of degrees of
freedom and make parallel computations desired.
Bibliography of parallelism in simulations of MBS dynamics includes hundreds of publications.
Main theoretical results in this field are related to parallel treating articulated body systems with long
kinematic chains. The divide-and-conquer (DCA) algorithm [2, 3] and the constraint force algorithm
(CFA) [4] are the O(log N) algorithms, i.e. they require O(log N) operations in simulation of an
articulated MBS with N rigid bodies on N processors. DCA, CFA and further developments of these
algorithms [5-10] do not support implicit solver procedures for stiff MBS, which limits their
application to simulation of rail vehicle dynamics.
Another important area for application of parallelism is related to molecular dynamics and
mathematically similar problems like DEM, SPH and so on [11, 12]. Concerning railway dynamic
problems, these methods are actual in simulation of track ballast [13, 14] as well as fluid and granular
media sloshing [15].
164 D. Pogorelov, A. Rodikov, R. Kovalev
A promising method of parallelization in the case of limited number of processors consists in
dividing an MBS into several subsystems. Subsystems are selected in different manner, either by
natural grouping of bodies and force elements [16, 17] or by cutting some joints [18]. Weakly
connected subsystems are considered in the article by Getmansky & Gorobtsov [19].
In a particular case, any MBS can be considered as a set of separate bodies after cutting all joints.
This method corresponds to the Cartesian MBS formulation. Joints are taken into account by
constraint equations [20, 21] or replaced by spring-and-damper force elements (compliant joints) [22-
24]. Replacing rigid joints by compliant ones allows a very efficient parallelization of any MBS, but it
makes equations of motion stiff. This fact was the main reason why the compliant joints could not be
widely used in MBS software codes. To overcome the problems connected with the stiffness of
equations, a heterogeneous multiscale method is proposed in Valasek and Mraz, as well as Mraz and
Valasek [22, 23], to be used for numerical integration, whereas a special implicit solver is used in
Pogorelov [24]. This paper includes an advanced description of the parallel algorithm proposed in
Pogorelov [24] and implemented in Universal Mechanism (UM) software [25] for simulation of
dynamics of general multibody systems and, in particular, rail vehicles and trains.
The Cartesian formulation provides a good balanced parallel computation of kinematic relations,
equations of motion and forces. At the same time, a large system of linear equations with a sparse
system matrix must be solved many times during the integration process. In the case of rigid joints, the
system matrix is symmetric but not positive definite and includes the mass matrix and the Jacobian
matrix of constraint equations as blocks. The size of the matrix is 6N+m, where N is the number of
bodies and m is the number of constraint equations [20]. If the compliant joints are used, the system
matrix is a sum of the mass matrix and the Jacobian matrix of stiff forces including compliant joints,
the matrix size is 6N, and it is positive definite [26, 27]. Parallel solving linear equations with a sparse
matrix can be executed with the preconditioned conjugate gradient method (PCGM) [28-30]; some
papers [31, 32] give examples of PCGM use in parallel MBS simulations. As it is shown in Section 2
of this paper, a block-diagonal preconditioning matrix can be constructed as the dominant one in a
definite sense, and a small number of PCGM iterations are required.
In the case of MBS tasks, parallel computations are usually executed on multicore processors, GPU
and clusters. In our opinion, simulations of rail vehicle and train dynamics should be implemented on
multicore processors. GPU are highly productive in solving problems of molecular dynamics, and
clusters are efficient in the case of multi-variant simulations, in particular, when solving optimization
problems.
Parallel simulation of MBS dynamics on multicore processors as it is implemented in UM software
is considered in Section 2. The section includes general forms of equations of motion for rigid and
flexible body systems according to the Cartesian formalism with compliant joints, description of the
Park implicit solver and details of a parallel algorithm based on the fork-join method. Examples of
parallel simulation of railway vehicle and train dynamics are given in Part II of this paper.
The implementation of parallel approach to predict wear of railway wheel profiles on multicore
processors is considered in Section 3. Two different methods are usually used for prediction of profile
wear [33]: the sequential method [34, 35] and the parallel one [36, 37]. Both methods consider
simulation of the rail vehicle dynamics on track sections with different geometry taking into account
various irregularities, rail profiles, vehicle mass, speeds and so on. The methods differ in the strategy
of modification of profiles. In the sequential algorithm, the profiles are modified at the end of each
simulation. In the parallel algorithm, the modification of profiles due to wear is performed many times
in small intervals of the traveling distance. The parallel algorithm is faster than the sequential one, but
the latter method is applicable for rail profile wear as well. An example of use of a multicore processor
for speeding up the evaluation of wheel profile wear is given in Part II of this paper.
Parallel implementation of multi-variant computations in UM software is considered in Section 4.
Some problems connected with co-simulation in MBS dynamics are discussed in Section 5.
Later we will use the following terms and definitions. A process is an executing instance of a
program. On a multiprocessor system, multiple processes can be executed in parallel. A thread is a
subset of the process. A process has at least one thread. A process may also be made up of multiple
Parallel computations and co-simulation in 165.
threads of execution that execute instructions concurrently. Multiple threads of control can exploit the
true parallelism possible on multiprocessor systems.
2. PARALLEL SIMULATION OF RAIL VEHICLE AND TRAIN DYNAMICS
In this section, we consider an algorithm for parallel simulation of rail vehicles and trains dynamics
on computers with multi-core processors.
2.1. Equations of motion
First, consider a rigid body with six degrees of freedom. Position of body i is set by three Cartesian
coordinates for the origin of the body-fixed system of coordinates, and by three
orientation coordinates . Kinematics of body i includes formulas for linear and angular velocities
and accelerations as well as for the direct cosine matrix
. (1)
Newton-Euler dynamic equations in accelerations are as follows:
(2)
where is the mass matrix, and are the vectors of accelerations, inertia
forces and applied forces.
Here is the mass, is the matrix of inertia tensor, is the radius-vector of centre of mass
relative to the body-fixed frame, are applied force and torque corresponding to interaction
between bodies i and j including forces for compliant joints and is the identity matrix. The
symbol ‘tilde’ over a vector denotes a skew-symmetric matrix generated by the vector and used for the
matrix notation of the vector product.
Flexible bodies are used in railway research for modelling both parts of vehicles and elements of
track infrastructure like rails, sleepers and bridges. Equations of motion of flexible bodies in
multibody system dynamics are often derived with Craig-Bampton method [38].The body movement
is split on a gross motion of a reference frame and small elastic deformations [39]. Coordinates and
kinematics of the reference frame coincide with aforementioned ones for a rigid body (1). A vector to
a separate body node k is as follows:
with elastic displacement of the node relative to reference frame. According to the Craig-
Bampton methodology, a set of constraint and fixed interface (static and dynamic) nodal modes
specify the elastic deformations so that the node displacement and rotation are linear
functions of generalized elastic coordinates
.
T
iiii zyx ),,(=r
i
p
iiii εωav ,,,
i0
A
iiiiiiiiiiiiii εpBεpBωpAArarv ¢
+===== !!!!!! ,),(,, 00
,)(,)(,)(
,
TT
iir
TT
i
T
iiiriir
TT
i
T
iir
j
ijriririr
ε0wprqwqWεaw
GkwM
¢
=
¢
=
¢
+==
å
+-=
!!
ir
M
ijririr Gkw ,,
16 ´
.,
~
~~
,,
~
~
3
3÷
÷
ø
ö
ç
ç
è
æ
=
÷
÷
ø
ö
ç
ç
è
æ
=
÷
÷
ø
ö
ç
ç
è
æ
=
÷
÷
ø
ö
ç
ç
è
æ-
=
ij
ij
ijr
iii
ciiii
ir
i
ir
icii
ciii
ir
m
m
mm
T
F
G
ωIω
ρωω
k
B0
0E
W
Iρ
ρE
M
i
m
i
I
ci
ρ
ijij TF ,
3
E
)(
0ikikiiik uρArr
D
++=
ik
u
D
ik
u
D
if
q
ifikikifikuik qHπqHu
p
DD
== ,
166 D. Pogorelov, A. Rodikov, R. Kovalev
The equations of motion of flexible body are as follows:
(3)
where are the damping and stiffness matrices, and are the force and torque applied
to node k.
Finally, consider equations of motion for a flexible structure in nodal coordinates, which can be
used for modeling flexible rails and sleepers. In contrary to the previous case, the nonlinear gross
motion does not take place, and equation of motion can be derived according to the usual finite
element method. Both rails and sleepers are considered here as Timoshenko or Euler-Bernoulli beams
with the following equations of motion for a beam with nodes:
(4)
Here is vector of nodal coordinates, are block tridiagonal mass, damping
and stiffness matrices of size and is the vector of applied forces and torques
at node k.
The general form of equation of motion for body i of types (2-4) is as follows:
(5)
The matrices in (5) should be omitted for rigid bodies.
2.2. Numerical method
Equations of motion (5) are stiff from the numerical point of view [40] by two main reasons. First,
if flexible bodies are presented in the model, internal elastic and damping forces are stiff. Second,
compliant joints, contact forces and some other force elements make equations stiff as well. The
implicit Park numerical method is used in UM software for solving stiff equations of motion [41]. The
linear multistep Park method is based on combination of backward differentiation formulas of the
second and third orders. Consider some details of the method, which are important for description of
the parallel computations.
At the beginning of each integration step, predictor values of the coordinates for body i are
computed. Here and below we omit the step index. The unknown corrector values are introduced
so that . The Park finite difference formula is used to express the time derivatives of
coordinates, velocities and accelerations in term of the corrector value
(6)
Here h is the step size of integration.
Let us introduce new unknown variables
(7)
in which equations can be written in a more compact form. For example,
. (8)
Now we introduce Jacobian matrices (JM) for stiff force elements. Let the generalized force in
Equations (5) be stiff and i, j are the indices of interacting bodies. Suppose the force depends on
,
~
,
00
÷
÷
÷
ø
ö
ç
ç
ç
è
æ
+
+=
÷
÷
÷
ø
ö
ç
ç
ç
è
æ
=
å
=+++
iki
T
ikiki
T
iku
ikikik
ik
ikf
if
i
i
if
kikfifiifiififif
TAHFAH
FρT
F
G
q
ε
a
w
GkqKqCwM
p
!!
!
ifif KC ,
ikik TF ,
ib
n
.
å
=++
k
ikbibibibibibib GqKqCqM !!!
ib
q
ib
n6
ibibib KCM ,,
ibib nn 66 ´
ikb
G
16 ´
ib
n
., iiii
j
ijiiiiiii wqWwGkqKqCwM ¢
+=
å
=+++ !!!
ii KC ,
p
i
q
i
q
d
i
p
i
iqqq
d
+=
.6.0,,, 22 h
ii
p
i
ii
p
i
ii
p
i
i=+=+=+=
bbdbdbd
qWwwqqqqqq !!!!!!
,
iii qWq
dd
=
¢
2
bd
i
p
i
iqww ¢
+=
ij
G
Parallel computations and co-simulation in 167.
positions and velocities of interacting bodies. Implicit integration scheme requires expansion of stiff
forces and torques in series of neglecting nonlinear terms
, (9)
where are the JM of forces. Approximate expressions for JM of different forces and
compliant joints are derived in Pogorelov and Pogorelov [26, 27]. It is important that for the stiff
forces, used in simulation of rail vehicles, the matrix
composed of the approximate JM is negative semidefinite [26, 27].
Substitution of Equations (7-9) in (5) yields linear equations relative to the unknown values
(10)
with the set of stiff force indices for body i, the positive definite matrix
(11)
and the vector summarizing all forces
(12)
The PCGM [28, 29] is suitable for parallel solving Equations (10) and the matrix is
highly efficient as the preconditioning matrix in this algorithm. To explain this statement, consider
Equations (10) in the full matrix form
According to a theorem proved in Pogorelov [26], the spectral radius of the matrix is less
than 1,
.
In this sense, the preconditioning matrix is a dominant one and a good convergence of the
PCGM iterations is observed. If we take into account that the mass matrix is completely included in
the preconditioning matrix, the PCGM iterations serves to stabilization of numerical method rather
than to increase of the accuracy. As a result, the number of iterations is usually small and rarely
exceeds one iteration.
Consider steps of the PCGM for solving Equations (10) in the form suitable for parallel
computations.
- Cholesky factorization of blocks of the preconditioning matrix (11)
(13)
Now Equations (10) can be rewritten for the variables :
(14)
- PCGM start: computation of initial solution, direction and residual :
(15)
ji qq ¢¢
dd
,
jjijiiij
p
ij
ij qJqJGG ¢
+
¢
+=
dd
,,
jijiij ,, ,JJ
÷
÷
ø
ö
ç
ç
è
æ
jjiiji
jijiij
,,
,,
JJ
JJ
i
q¢
d
i
Sj jjijii
i
QqJqM 2
,
2
bdbd
=
å¢
-
¢¢
Î
i
S
i
M¢
å
-++=
¢
Îi
Sj iijii
p
i
i,
22 JKCMM
bbb
.
å
+----=
j
p
ij
p
i
p
i
i
p
i
i
p
i
p
i
iGkqKqCwMQ !
}{diag i
'MM ¢
=
.)( QqJM =
¢
-
¢
dD
JM
D
1-
¢
1)( 1<
¢-JM
Dr
M¢
.
T
iii LLM =
¢
i
T
i
L
iqLq ¢
=
dd
.,
,
12
,
12
,
,
ii
L
i
T
j
jij
i
L
jij
L
i
Sj
L
j
L
jij
L
i
i
QLQLJLJ
QqJq
---
Î
==
=
å
-
bb
dd
000 ,, ii
L
irdq
d
.0,, 0
,
000 =
å
-===
Î
gdd
i
Sj
L
j
L
jijii
L
i
L
iqJdrQq
168 D. Pogorelov, A. Rodikov, R. Kovalev
- PCGM iterations include three steps; k=1,2,… is the index of iterations.
CGM1. ; .
CGM2. ; .
CGM3. ; ; ; .
Iterations stop when , where e is the error tolerance.
- After the convergence of iterations, the corrector values are computed
(16)
as well as the final calculation of the model coordinates is done
(17)
2.3. Parallel solution of equations of motion on multi-core processors
The technique, based on the Windows API threads, is used for practical realization of parallel
computations in UM according to the algorithms described in Section 2.2. The fork-join model of
parallelism uses a master thread and several parallel sections (PSs) on each integration step [42],
Fig. 1. The total number of threads is recommended not to exceed the number of cores on the local
computer. There exists a barrier at the end of each PS, and the balancing of computational load in the
threads is important.
Fig. 1. Fork-join model of parallel computation
Consider the list of PSs according to the algorithm described in Section 2.2.
PS1. Prediction , computation of kinematics (1), mass matrices , internal elastic and
damping forces , inertia and gravity forces in Equations (5).
PS2. Evaluation of forces and Jacobian matrices , Equations (9).
PS3. Evaluation (11) and factorization (13) of the preconditioning matrices , computation of
total forces in Equations (12) as well as in (14).
PS4. Computation of start vectors of the PCGM (15).
11 -- += k
i
k
i
k
idrd
g
k
i
T
i
kL
idLd -
=
å
-=-
j
Lk
j
ij
i
k
i
k
idJLdc 12
b
åå
-=k
i
kT
i
k
i
kT
icdrd
a
k
i
Lk
i
Lk
idqq
add
+= -1
k
i
k
i
k
icrr
a
+= -1
å
=k
i
kT
ik rr
r
1-
=kk
rrg
er
<
k
L
i
T
ii qLq
dd
-
=
¢
.
1
ii
p
i
iqWqq ¢
+= -
d
p
i
p
i
p
iqqq !!! ,,
i
M
p
i
i
p
i
iqKqC +
!
i
k
jijiij
p
ij ,, ,, JJG
i
M¢
L
i
Q
000 ,, ii
L
idrq
d
Thread 1
Thread 2
Thread N
PS1
PS2
Barrier
Barrier
Master thread
Integration step
Thread 1
Thread 2
Thread N
Parallel computations and co-simulation in 169.
The next three sections correspond to iterations of PCGM.
PS5. Evaluation of , CGM1.
PS6. Evaluation of , CGM2.
PS7. Evaluation of , CGM3.
PS8. Computation of the corrector values (16).
PS9. Computation of new values of coordinates (17), velocities and accelerations (6).
All PSs make computations independent for each of the bodies except the evaluation of forces and
Jacobians in PS2. Thus, a good balancing of the CPU load in the case of a rigid body MBS is usually
possible after splitting the total set of bodies into near-equal subsets. To decrease the thread load
disbalance in computation of forces in PS2, the force elements are preliminary grouped according to
their type and uniformly distributed between the threads.
In the case of a hybrid MBS including flexible bodies, an improvement of CPU load balancing can
be achieved by using a simple algorithm for optimal distribution of computations between threads.
The algorithm relates to the combinatorial generalized stone problem. Let be the average CPU
time, which is necessary to run computational procedure i in PSj for one body or force element. We
accept this parameter as a weight of the procedure in the corresponding PS. The expected computation
load for thread k is as follows:
where is the set of bodies or forces assigned to thread k in PSj. The optimal distribution of
computation procedures between threads is formulated as the minimax problem
i.e. as minimization of the maximal CPU load by changing the sets . An appropriate
approximation of the optimal solution for N threads can be found by the following algorithm:
- computational procedures are sorted in decreasing order of the weight ;
- the first N procedures with the maximal weights are assigned to each thread;
- each of the next procedures is assigned to the thread with the minimum aggregate weight.
3. EVOLUTION OF RAILWAY WHEEL PROFILE DUE TO WEAR
UM software has a special tool for prediction of railway wheel profiles evolution due to wear. In
this tool, a parallel discrete approach has been implemented. This algorithm supposes a parallel
simulation of a railway vehicle model with different configurations which somehow approximate the
real operational conditions of the vehicle. Configurations differ in track geometry and irregularities,
rail profiles, vehicle mass, speed and so on. The set of configurations should be a representative set of
conditions in which the rail vehicle is operated.
A track length travelled by the vehicle during the simulation is divided into a sequence of intervals
(wear steps). The number of intervals is the same for all the configurations. The wheel profiles are
kept unchanged within each of the interval. The wheel profiles are modified at the end of each wear
step according to weighted diagrams of distribution of friction work along the wheel profile. A small
quantity of material is removed on each wear step. Large number of steps results in a smooth profile
modification. The calculation of material losses is based on the theory proposed by J.F. Archard [43].
Contact forces are computed using the model of W. Kik and J. Piotrowski [33] or the CONTACT
library [44].
Multithread parallel computations are realized as follows (Fig. 2):
- at each wear step, the configurations are computed in parallel using a fixed number of threads,
- the computation barrier at the end of the section corresponds to the end of the wear step and
k
i
d
k
i
c
,
Lk
i
q
d
k
i
r
i
q¢
d
ij
w
å
»
Îjk
Si ijjk wW
jk
S
,maxmin jk
k
W
jk
S
...
21 jj ww ³
170 D. Pogorelov, A. Rodikov, R. Kovalev
- modification of profiles and rebalancing of threads is executed before the start of the next step.
A weight of a configuration for thread balancing is equal to its CPU time at the previous wear step.
Similar to the optimization procedure described in Section 2.3, the list of configurations is sorted in
decreasing order of the weights. At the begin of a new step, the parallel computation starts with the
configurations of the maximal weight, and when a thread finishes the computation for the assigned
configuration it obtains the next one with the maximal weight from the rest of the list.
Fig. 2. Fork-join model of parallel computation of wear of railway wheel profiles
4. MULTI-VARIANT PARALLEL COMPUTATIONS
The aforementioned sections are devoted to parallel multithread computing of a numerical
experiment within an executing instance of a program that is called a process. UM software also
supports another type of parallel computing parallel running the number of solvers as processes to
calculate different models or different configurations of a model. Model configurations differ from
each other in parameters (geometrical, inertial, stiffness, damping and other) or experimental
conditions like wheel or rail profile, railway track geometry or irregularities and vehicle speed.
Such an approach allows engineer to specify the list of model parameters and experimental
conditions, its lower and upper limits and a number of points with the limit for each parameter and
then run the automatic parallel computation of the series of numerical experiments. In typical
engineering practice, there might be decades and hundreds and even thousands of numerical
experiments. In such a case, automatic running a series of numerical experiments saves hours and days
of monotonous work.
To compute the series of numerical experiments, a control process runs one instance of solver
many times sequentially or several of them simultaneously with feeding them source data for every
next numerical experiment. At the same time, every solver may use one or several threads for
computations. Thus, we have a combination of the multiprocess and multithread approaches. There are
several possible strategies to compute a series of numerical experiments: (1) to run one solver
sequentially with allowed maximum of computing threads in it, (2) to run allowed maximum of
solvers with the only computing thread in it or (3) to run several solvers with several threads in each.
The comparison of the possible strategies mentioned above will be given in Part II of the paper. In UM
software, the maximum number of simultaneously running processes (solvers) and threads in one
solver is limited by the number of logical processors of the operating system.
UM software includes a special service of distributed calculations named UM Cluster. It allows
using all computational power of a local/corporate network for execution of series of numerical
Simulation of rail vehicles dynamics
on one wear step
Master thread
Barrier
Task manager
Profiles update
Thread 1
Thread 2
Thread N
Parallel computations and co-simulation in 171.
experiments that decreases time efforts correspondingly. This possibility is very easy and effective to
use within computer centers and laboratories. Service of distributed calculations is based on using
TCP/IP that allows employing any computer not only in local network but also in Intra- and Internet
for the needs of a project. It significantly decreases total time for computation of big series of
numerical experiments.
The service consists of two parts: server and client ones. The server part works on the head
computer and controls the execution sends jobs and receives results back. The client part is run on
the peripheral computers, gets and computes jobs and sends results to the server.
Remote install and uninstall of the client parts on client computers are supported. Remote
installation of the software components on the client computer runs automatically without disturbing
remote users. There is a built-in tool to determine all available computers in a local network that
simplify initial procedures of searching and adding client computers to the computational cluster.
5. CO-SIMULATION
Computer simulation in railway research sometimes requires a combination of purely multibody
models that Universal Mechanism deals with and associated models of control systems, power
electrical machines, hydraulic and pneumatic elements etc., see, for instance, the article by Wang et al
[45]. Such complex systems cannot be described with the help of UM built-in elements and tools and
should be simulated with the help of special-purpose codes. UM software supports several such
interfaces: interface with Matlab/Simulink [46] and SimInTech [47], interface with dynamic-linked
library (DLL) developed by user with the help of any programming language.
Users can then simulate their systems' full-motion behaviour from within the Simulink or
SimInTech environment or include their models into the UM environment and visualize the results
using animations and plotting.
UM software supports two different co-simulation techniques. The first technique supposes the
exporting the Simulink or SimInTech models as a DLL with subsequent loading the DLL within the
model context and setting connections between a mechanical part and the model exported as the DLL.
Such an approach also allows a user to develop his/her own dynamic-linked library using any program
environment that supports generating DLLs. In this sense, it is a universal way to plug in user’s code
and algorithms into a UM model.
The second technique supposes exporting a UM model from UM for posterior integration into a
Simulink or SimInTech model. To import UM models into Simulink, the standard S-Function element
is used. S-Function element includes the code that is automatically generated by UM and uses UM as a
COM server to send and get signals. To import UM models into SimInTech model, the special ‘UM
model’ block is used.
Universal Mechanism is also distributed as a Component Object Model (COM) library for using in
third-party applications including Simulink. Component Object Model (COM) is a binary-interface
standard for software components introduced by Microsoft in 1993. It is used to enable inter-process
communication object creation in a large range of programming languages [48]. COM interfaces
implemented in UM allows a user to load UM models, change their parameters and simulate dynamics
of the model in a co-simulation mode with getting kinematical data of the mechanical model for
external visualization and sending control signals. Typical way of usage of UM COM interfaces is
given in Fig. 3.
Universal Mechanism as a COM server for dynamical analysis was used in a number of projects
including two train driving real-time simulators [24, 49].
6. CONCLUSION
A detailed model of parallelization in simulation of multibody systems containing both rigid and
flexible bodies is described in the paper. The model includes equations of motion, with the main
172 D. Pogorelov, A. Rodikov, R. Kovalev
feature consists in replacement of rigid joints by compliant ones; formulation of PCGM with a very
efficient preconditioning; description of parallel sections for the fork-join parallelism; and the
algorithm for improvement of thread balancing in parallel sections, which is important when the
simulated model includes flexible bodies along with the rigid ones. As it will be shown in Part II of
this publication, the method allows a considerable speed up of simulation on multicore processors,
both relatively simple models of rail vehicles with about one hundred DOF and much more complex
models with dozens of thousands of DOF. The algorithm efficiency is based on the use of shared
memory parallelism, on parallelization of all stages of computations from the prediction up to the
solution of linear equations, and on the good balancing of threads within all of the parallel sections.
Fig. 3. Using UM COM server in the third-party application
The implementation of the fork-joint model for the parallel approach to predicting wear of railway
wheel profiles is described in the paper. The efficiency of this algorithm will be shown in Part II.
Parallel multi-variant computations as well as UM interfaces with Matlab/Simulink and SimInTech
are briefly discussed.
ACKNOWLEDGMENTS
The research was supported by the Russian Foundation for Basic Research under
grant 17-01-00815a.
Interactive part
Load model from file
Check the model
START LOOP
Apply user / external control
Increment time ti=ti+Δt
Get current position and
orientation of bodies
Visualization
(monitor, TV, projector)
User Control
(joystick, control panel)
Finish integration
Time t=0
Perform the integration step
END LOOP
Parallel computations and co-simulation in 173.
References
1. Wu, Q. & Spiryagin, M. & Cole C. & et al. Parallel computing in railway research. International
Journal of Rail Transportation. 2018.
2. Featherstone, R. A divide-and-conquer articulated body algorithm for parallel O (log (n))
calculation of rigid body dynamics. Part 1: Basic algorithm. The International Journal of Robotics
Research.1999. Vol. 18. No. 9. P. 867-875.
3. Featherstone, R. A divide-and-conquer articulated body algorithm for parallel O (log (n))
calculation of rigid body dynamics. Part 2: Trees, loops, and accuracy. The International Journal
of Robotics Research. 1999. Vol. 18. No. 9. P. 876-892.
4. Fijany, A. & Sharf, I. & D’Eleuterio, G.M.T. Parallel O (log N) algorithms for computation of
manipulator forward dynamics. IEEE Transactions on Robotics and Automation. 1995. Vol. 11.
No. 3. P. 389-400.
5. Critchley, J.H. & Anderson, K.S. & Binani, A. An efficient multibody divide and conquer
algorithm and implementation. Journal of Computational and Nonlinear Dynamics. 2009. Vol. 4.
No. 2. 021001.
6. Fijany, A. & Featherstone, R. A new factorization of the mass matrix for optimal serial and
parallel calculation of multibody dynamics. Multibody System Dynamics. 2013. Vol. 29. No. 2.
P. 169-187.
7. Khan, I.M. & Anderson, K.S. A logarithmic complexity divide-and-conquer algorithm for multi-
flexible-body dynamics including large deformations. Multibody System Dynamics. 2015. Vol. 34.
No.1. P. 81-101.
8. Chadaj, K. & Malczyk, P. & Fraczek, J. A parallel Hamiltonian formulation for forward dynamics
of closed-loop multibody systems. Multibody System Dynamics. 2017. Vol. 39. No. 1-2. P. 51-77.
9. Liu, F. & Zhang, J. & Hu, Q. A modified constraint force algorithm for flexible multibody
dynamics with loop constraints. Near Dynamics. 2017. Vol. 90. No. 3. P. 1885-1906.
10. Malczyk, P. & Fraczek, J & González, F. & et al. Index-3 divide-and-conquer algorithm for
efficient multibody system dynamics simulations: theory and parallel implementation. Nonlinear
Dynamics. 2019. Vol. 95. No. 1. P. 727-747.
11. Iglberger, K. & Rüde, U. Massively parallel rigid body dynamics simulations. Computer Science-
Research and Development. 2009. Vol. 23. No. 3-4. P. 159-167.
12. Negrut, D. & Serban, R. & Mazhar, H. & et al. Parallel computing in multibody system dynamics:
why, when and how. Journal of Computational and Nonlinear Dynamics. 2014. Vol. 9. No. 4.
041007.
13. Khatibi, F. & Esmaeili, M. & Mohammadzadeh, S. DEM analysis of railway track lateral
resistance. Soils and Foundations. 2017. Vol. 57. No. 4. P. 587-602.
14. Tutumluer, E. & Qian, Y. & Hashash, Y.M.A. & et al. Discrete element modelling of ballasted
track deformation behavior. International Journal of Rail Transportation. 2013. Vol. 1. No. 1-2.
P. 57-73.
15. Fleissner, F. & Lehnart, A. & Eberhard, P. Dynamic simulation of sloshing fluid and granular
cargo in transport vehicles. Vehicle system dynamics. 2010. Vol. 48. No. 1. P. 3-15.
16. Fisette, P. & Peterkenne, J.M. Contribution to parallel and vector computation in multibody
dynamics. Parallel Computing. 1998. Vol. 24. P. 717-728.
17. Postiau, T. & Sass, L. & Fisette, P. & et al. High-performance multibody models of road vehicles:
Fully symbolic implementation and parallel computation. 20th International Conference of
Theoretical and Applied Mechanics. Chicago, 2000. In: Vehicle System Dynamics: International
Journal of Vehicle Mechanics and Mobility. 2001. Vol. 35. P. 57-83.
18. Anderson, K.S. & Duan, S. Highly parallelizable low order dynamics simulation algorithm for
multi-rigid-body systems. Journal of Guidance. Control and Dynamics. 2000. Vol. 23. No. 2.
P. 355-364.
19. Гетманский, В.В. & Горобцов, А.С. Решение задач большой размерности в системах
моделирования многотельной динамики с использованием параллельных вычислений.
174 D. Pogorelov, A. Rodikov, R. Kovalev
Известия Волгоградского государственного технического университета. 2007. No. 9(35).
P. 10-12. [In Russian: Getmansky, V.V. & Gorobtsov, A.S. Large-scale tasks solving in multibody
dynamics simulation systems using parallel computing. Proceedings of Volgograd State Technical
University].
20. Orlandea, N. & Chace, M.A. & Calahan, D.A. A sparsity oriented approach to the dynamic
analysis and design of mechanical systems Part I. Journal of Engineering for Industry. 1977.
Vol. 99, No. 3. P. 773-784.
21. Haug, E.J. Computer-Aided Kinematics and Dynamics of Mechanical Systems Volume-I. Boston:
Allyn and Bacon. 1989. 511 p.
22. Valasek, M. & Mraz, L. Parallelization of multibody system dynamics by heterogeneous
multiscale method. In: Proceedings of Multibody Dynamics 2011, ECCOMAS Thematic
Conference. Brussels. 2011.
23. Mraz, L. & Valasek, M. Solution of three key problems for massive parallelization of multibody
dynamics. Multibody System Dynamics. 2013. Vol. 29. No. 1. P. 21-39.
24. Pogorelov, D. & Yazykov, V. & Lysikov, N. & et al. Train 3D: the technique for inclusion of
three dimensional models in longitudinal train dynamics and its application in derailment studies
and train simulators. Vehicle System Dynamics. 2017. Vol. 55. No. 4. P. 583-600.
25. Universal Mechanism. Bryansk: Laboratory of Computational Mechanics. Available at:
https://www.universalmechanism.com/.
26. Pogorelov, D.Y. Jacobian matrices of the motion equations of a system of bodies. Journal of
Computer and Systems Sciences International. 2007. Vol. 46. No. 4. P. 563-577.
27. Pogorelov, D.Y. Simulation of constraints by compliant joints. Journal of Computer and Systems
Sciences International. 2011. Vol. 50. No. 1. P. 158-173.
28. Shewchuk, J.R. An introduction to the conjugate gradient method without the agonizing pain.
Pittsburgh: Carnegie-Mellon University. 1994. 58 p.
29. Eberhard, P. Kontaktuntersuchungen durch hybride Mehrkörpersystem/Finite Elemente
Simulationen. [In German: Contact investigations by hybrid multibody system / finite element
simulations]. Aachen: Shaker Verlag; 2000. 318 p.
30. Saad Y. Iterative methods for sparse linear systems. 2nd ed. Philadelphia: SIAM. 2003. 528 p.
31. Kurdila, A. & Menon R.G. & Sunkel, J.W. Nonrecursive Order N formulation of multibody
dynamics. Journal of Guidance Control and Dynamics. 1993. Vol. 16. No. 5. P. 838-844.
32. Sharf, I. & D’Eleuterio, G.M.T. An Iterative Approach to Multibody Simulation Dynamics
Suitable for Parallel Implementation. Journal of Dynamic Systems, Measurement, and Control.
1993. Vol. 115. No. 4, P. 730-735.
33. Piotrowski, J. & Kik, W. A simplified model of wheel/rail contact mechanics for non-Hertzian
problems and its application in rail vehicle dynamic simulations. Vehicle System Dynamics. 2008.
Vol. 46. No. 1-2. P. 27-48.
34. Zobory, I. Prediction of Wheel/Rail Profile Wear. Vehicle System Dynamics. 1997. Vol. 28. No. 2-
3. P. 221-259.
35. Goryacheva, I.G. & Zakharov, S.M. & Soshenkov, S.N. & et. al. Tribodynamic simulation of
wheel/rail profile evolution and contact-fatigue damage accumulation for some variable track and
vehicle parameters. Vniizht Bulletin (Railway Research Institute Bulletin). 2011. No. 1. P. 13-19.
36. Von Dist, K. & Ferrarotti, G. & Kik, W. & et al. Wear analysis of the high-speed-grinding Vehicle
HSG-2: validation, simulation and comparison with measurements. In: 25th International
Symposium on Dynamics of Vehicles on Roads and Tracks. Rockhampton, 2017.
37. Auciello, J. & Ignesti, M. & Marini, L. & et al. Development of a model for the analysis of wheel
wear in railway vehicles. Meccanica. 2013. Vol. 48. No. 3. P. 681-697.
38. Craig, R. & Bampton, M. Coupling of substructures for dynamic analyses. AIAA Journal. 1968.
Vol. 6. No. 7. P. 1313-1319.
39. Simeon, B. DAEs and PDEs in elastic multibody systems. Numerical Algorithms. 1998. Vol. 19.
P. 235-246.
40. Hairer, E, Wanner, G. Solving ordinary differential equations II. Stiff and differential-algebraic
problems. Berlin: Springer. 1996. 614 p.
Parallel computations and co-simulation in 175.
41. Park, K.C. An improved stiff stable method for direct integration of nonlinear structural dynamic
equations. Journal of Applied Mechanics. 1975. Vol. 42. No. 2. P. 464-470.
42. Nyman, L. & Laakso, M. Notes on the History of Fork and Join. IEEE Annals of the History of
Computing. 2016. Vol. 38. No. 3. P. 84-87.
43. Archard, J.F. Contact and Rubbing of Flat Surface. Journal of Applied Physics. 1953. Vol. 24.
No. 2. P. 981-988.
44. Vollebregt, E.A.H. User guide for CONTACT, rolling and sliding contact with friction (Technical
Report TR09-03). Delft: VORtech CMCC. 2018. 141 p.
45. Wang, H. & Deng, Z. & Ma, S. & Sun, R. & Li, H. & Li, J. Dynamic Simulation of the HTS
Maglev Vehicle-Bridge Coupled System Based on Levitation Force Experiment. IEEE
Transactions on Applied Superconductivity. 2019. Vol. 29. No. 5. P. 1-6. Art no. 3601606. DOI:
10.1109/TASC.2019.2895503.
46. Matlab/Simulink. Available at: https://www.mathworks.com.
47. SimInTech. Available at: simintech.ru.
48. Wikipedia: Component Object Model. Available at:
https://en.wikipedia.org/wiki/Component_Object_Model.
49. Öztürk, V. & Arar, Ö.F. & Rende, F.Ş. & et al. Validation of railway vehicle dynamic models in
training simulators. Vehicle System Dynamics. 2017. Vol. 55. No. 1. P. 41-71.
Received 10.05.2018; accepted in revised form 05.09.2019
... In Section 2, we consider a practical application of parallel algorithms, described in the first part of this paper [1], and implemented in Universal Mechanism software [2]. ...
... Frictional wedges are modeled by rigid bodies with six DOF each. The model has no joints, and its simulation with the parallel algorithm described in [1] is faster using even one thread in comparison with other solvers available in UM. Simulation in our case corresponds to the vehicle motion in a tangent section of 300 m length with a constant speed of 20 m/s. ...
... As discussed in the first part of this paper [1], there are several possible strategies to compute a series of numerical experiments: (1) to run one solver sequentially with allowed maximum of computing threads in it, (2) to run allowed maximum of solvers with the only computing thread in it, or (3) to run several solvers with several threads in each. In UM software, the maximum number of simultaneously running processes (solvers) and threads in one solver is limited by the number of logical processors of the operating system. ...
Article
Full-text available
The second part of the paper continues a discussion on the topic of parallel computations in railway dynamics. The algorithms described in the first part of the paper are applied to parallel simulation on computers with multicore processors of six different models of rail vehicles and trains with the number of degrees of freedom from about one hundred to more than 20 thousands. A considerable simulation speedup is reported. In addition, an example of evaluation of wheel profile wear on multicore processors and comparison of different approaches to multi-variant computations are considered.
Article
Full-text available
The article presents an analysis of the methods for studying the dynamics of the branches of the track bypass, substantiates and proposes a research methodology and a simulation spatial model of the track mover of a cross-country transport vehicle, which differs from the common string inertialess and rod inertial models of the branches of the track mover by the ability to take into account a complex set of kinematic and force factors excited during the movement of the tracked vehicle in steady and transient motion modes. The model is developed in the “Universal Mechanism” software package, combines the main dependencies of the machine’s suspension system operation in various driving conditions and the dependencies of the interaction of the track links with each other, taking into account the radial, longitudinal and torsional rigidities during their interaction. The model allows for further development for the purpose of in-depth study and consideration of various factors acting in the design elements of the track mover, such as the features of the interaction of the support rollers with the links in the support branch of the track, the formation features of the moment of resistance to rotation and the turning moment depending on the type of the turning mechanism — stepped, differential hydrostatic. The efficiency of the model is assessed using the example of studying the phenomenon of “capture” of the track by the drive wheel when running over a typical unevenness of a sinusoidal profile. The substantiation of the technical solution option for the problem is given, installation of a free branch damper, and the results of determining the numerical parameters of its dynamic loading in relation to the object of study, a cross-country tracked transport vehicle. Thus, based on the results of the studies, the features of dynamic processes in the track mover were established, the main patterns of the formation process of transverse oscillations, their dependence on the design parameters of the bypass and various operational factors were revealed.
Conference Paper
This paper proposes to use the principle of "Superharmonic Resonance" in theory of nonlinear vibration to explain a common frequency peak phenomenon in field tests of metro environmental vibration. Based on the theory of FMBD (Flexible Multi-Body Dynamics), a complex “vehicle-track-tunnel” model was constructed with help of Universal Mechanism software. Both flexibilities of railway track structures and nonlinear contact characteristics between the wheel and rail pairs have been taken into account, the Park method was used to solve stiff DAEs. Superharmonic resonance phenomena of wheel-rail forces and accelerations of substructure have been carried out numerically as expected. The principle can be not only used for analysis of environmental vibration, but also for study of out-of-round wheel as well as rail corrugation problems. Key words: superharmonic resonance; nonlinear wheel-rail contact; environmental vibration;
Article
Full-text available
Available computing power for researchers has been increasing exponentially over the last decade. Parallel computing is possibly the best way to harness computing power provided by multiple computing units. This paper reviews parallel computing applications in railway research as well as the enabling techniques used for the purpose. Nine enabling techniques were reviewed and Message Passing Interface, Domain Decomposition and Hadoop & Apache are the top three most widely used enabling techniques. Seven major application topics were reviewed and iterative optimisations, continuous dynamics and data & signal analysis are the most widely reported applications. The reasons why these applications are suitable for parallel computing were discussed as well as the suitability of various enabling techniques for different applications. Computing time speed-ups that were reported from these applications were summarised. The challenges for applying parallel computing for railway research are discussed.
Article
Full-text available
There has been a growing attention to efficient simulations of multibody systems, which is apparently seen in many areas of computer-aided engineering and design both in academia and in industry. The need for efficient or real-time simulations requires high-fidelity techniques and formulations that should significantly minimize computational time. Parallel computing is one of the approaches to achieve this objective. This paper presents a novel index-3 divide-and-conquer algorithm for efficient multibody dynamics simulations that elegantly handles multibody systems in generalized topologies through the application of the augmented Lagrangian method. The proposed algorithm exploits a redundant set of absolute coordinates. The trapezoidal integration rule is embedded into the formulation and a set of nonlinear equations need to be solved every time instant. Consequently, the Newton–Raphson iterative scheme is applied to find the system coordinates and joint constraint loads in an efficient and highly parallelizable manner. Two divide-and-conquer-based mass-orthogonal projections are performed then to circumvent the effect of constraint violation errors at the velocity and acceleration level. Sample open- and closed-loop multibody system test cases are investigated in the paper to confirm the validity of the approach. Challenging simulations of multibody systems featuring long kinematic chains are also performed in the work to demonstrate the robustness of the algorithm. The details of OpenMP-based parallel implementation on an eight-core shared memory computer are presented in the text and the parallel performance results are extensively discussed. Significant speedups are obtained for the simulations of small- to large-scale multibody open-loop systems. The mentioned features make the proposed algorithm a good general purpose approach for high-fidelity, efficient or real-time multibody dynamics simulations.
Article
Full-text available
This paper presents a modified constraint force algorithm (m-CFA) for the dynamics of flexible multibody systems in arbitrary topologies. The m-CFA can efficiently calculate the constraint forces while solving for the motion of the system. The motion equations are based on recursive formulation, whereas a global equation is generated to obtain the explicit expression of the constraint forces and the motion of the system. For flexible bodies, the modal superposition method is employed under the assumption of a small deformation. The auxiliary generalized velocities or Lagrange multipliers in complex forms are avoided in the constraint equations for structural loops. To improve the efficiency of the m-CFA, a parallel implementation method is utilized to reduce the required arithmetic operations to order-n·logn, where n is the number of bodies in the system. The accuracy and efficiency of m-CFA are validated through numerical examples of a flexible dual-arm space robot. The results show that the m-CFA agrees with the well-developed recursive method and the order of the relative errors of the generalized velocities are smaller than 10-10.
Article
Full-text available
Training simulators play an important role for sustaining safety, efficiency and cost effective railway transportation. Dynamic modelling of train systems is one of the main modules of training simulators. Validation of the dynamic models with collected real data ensures the fidelity of the simulator utilising the respective models. In this study, a validation process (Dynamic Modelling Validation Process (DyMVaP)) which is developed to support the validation of railway dynamic models is introduced. However, the proposed process can also be used in validating other dynamic models as well. The developed process is based on five steps including the preparation of validation scenarios, sensor deployment, real data collection, data preparation, and comparison of simulated and measured data. Note that the proposed DyMVaP was used for the validation of a full-mission training simulator so called TRENSIM, which was developed for Turkish State Railways. During the study it is realised that the current speed, travelled distance, acceleration (in x, y, z directions), rotation angles (around x, y, z axes), air pressure, in-train pressure/tension forces, traction motor currents, catenary voltage, positions of controllers must be collected synchronously by using proper sensors in order to ensure simulation validation. The required data was collected from locomotive body, bogies, wheel sets and connection of railway cars. The data (∼200 GB) collected from the field by applying 27 different scenarios and transformed into appropriate data for utilising the generated dynamic models within the simulator. The measured and simulated data were also compared visually using graphical representation of the parameters as well as performing computations regarding the magnitude, phase and comprehensive error factors.
Article
Full-text available
Article
Full-text available
This paper presents a novel recursive divide-and-conquer formulation for the simulation of complex constrained multibody system dynamics based on Hamilton’s canonical equations (HDCA). The systems under consideration are subjected to holonomic, independent constraints and may include serial chains, tree chains, or closed-loop topologies. Although Hamilton’s canonical equations exhibit many advantageous features compared to their acceleration based counterparts, it appears that there is a lack of dedicated parallel algorithms for multi-rigid-body system dynamics based on the Hamiltonian formulation. The developed HDCA formulation leads to a two-stage procedure. In the first phase, the approach utilizes the divide and conquer scheme, i.e., a hierarchic assembly–disassembly process to traverse the multibody system topology in a binary tree manner. The purpose of this step is to evaluate the joint velocities and constraint force impulses. The process exhibits linear O(n) (n – number of bodies) and logarithmic O(log2n)O(\log_{2}{n}) numerical cost, in serial and parallel implementations, respectively. The time derivatives of the total momenta are directly evaluated in the second parallelizable step of the algorithm. Sample closed-loop test cases indicate very small constraint violation errors at the position and velocity level as well as marginal energy drift without any additional form of constraint stabilization techniques involved in the solution process. The results are comparatively set against more standard acceleration based Featherstone’s DCA approach to indicate the performance of the HDCA algorithm.
Article
In this paper, we built a high-temperature supercon-ducting (HTS) maglev vehicle-bridge coupled system model by Universal Mechanism (UM) software, and analyzed the vertical dynamics. The UM model is composed of two parts, the train subsystem involved three vehicles, and the flexible bridge with simple-supports. In the UM modeling system, the expression of levitation force and the parameters related to the maglev vehicle-bridge were indispensable. The levitation force of maglev vehicle was described by an exponential analytical expression simplified by the experimental results of four YBCO bulks above a Hal-bach permanent magnetic guideway. The parameters related to the maglev vehicle-bridge are based on experimental prototype. Based on the UM model, the vertical dynamic was simulated and analyzed with different bridge spans under different operating velocities. This subject is a basic study for understanding the unique dynamic characteristic of the HTS maglev vehicle-bridge system. The simulation results provided reference for the further design of the HTS maglev vehicle-bridge coupled system in different speed ranges. Index Terms-HTS maglev, vehicle-bridge coupled model, levitation force, dynamical response, Universal Mechanism.
Article
In this study, a sensitivity analysis of track lateral resistance using DEM was carried out on a number of concrete sleepers on a test track and the results of the Single Tie Push Test were simulated by the discreet element method implemented in PFC3D. By incremental loading of sleeper in STPT simulation, the obtained load-lateral displacement was compared with those obtained in field tests. Based on the good compatibility of the results, many sensitivity analyses were performed on ballast depth, ballast shoulder width, ballast shoulder height, inter particle friction of ballast material and ballast layer porosity (density).
Article
The needs for analyzing vehicle dynamics rely on advanced modelling techniques, efficient program implementation and high-level computer environment. To match these requirements, two important assets should characterize modern modelling tools: computer efficiency (indispensable, for instance, for real time control or system optimization) and portability (indeed, multi-domain modelling is more and more required). In the frame of vehicle dynamics, a symbolic multibody approach is proposed in this paper. Vehicle model efficiency, which will be quantified through examples, results from a fully symbolic generation of the dynamical model in recursive form, and the resort to parallel computation techniques. Portability towards other computer environment is also ensured by the symbolic approach, the latter being able to export equations in a very versatile way.
Article
This paper addresses three questions related to the use of parallel computing in multi-body dynamics (MBD) simulation. The "why parallel computing?" question is answered based on the argument that in the upcoming decade parallel computing represents the main source of speed improvement in MBD simulation. The answer to "when is it relevant?" is built around the observation that MBD software users are increasingly interested in multi-physics problems that cross disciplinary boundaries and lead to large sets of equations. The "how?" question is addressed by providing an overview of the state of the art in parallel computing. Emphasis is placed on parallelization approaches and support tools specific to MBD simulation. Three MBD applications are presented where parallel computing has been used to increase problem size and/or reduce time to solution. The paper concludes with a summary of best practices relevant when mapping MBD solutions onto parallel computing hardware.