Content uploaded by Michael T. Wolfinger
Author content
All content in this area was uploaded by Michael T. Wolfinger on Aug 03, 2023
Content may be subject to copyright.
The Vienna RNA Package is a state of the art tool to predict and compare RNA secondary structures. The most used function is the secondary structure
prediction which uses the minimum free energy algorithm[1].
Since placing transistors on a single chip and raising the clock speed have inherent limitations computer architects have focused on ideas of the
1980s: multiple processors that share memory are combined to a single computer or even a chip.[2]
To gain an overview of the possibilities of parallelization the simplest RNA folding algorithm was used for pre-testing.
Abstract
References
[1] Zuker, M. Optimal computer folding of large RNA sequences using thermodynamics
and auxiliary information.(1981)
[2] Chapman, B. et al. Using OpenMP: portable shared memory parallel programming
(2007)
[3] Nussinov, R. et al. Algorithms for loop matching (1978)
Both possible matrix fill strategies were implemented in two small C programmes and were made parallel with OpenMP. The diagonal fill method is
straight forward to parallelize, because of data independence. The row by row method needs extra synchronisation, often referred to as pipelined
execution. The diagonal fill method is faster and has a lower overhead and therefore it should be used for multi core processors. On the other hand the
row by row strategy has a better cache utilization and should therefore be used for single core processors.
Conclusion
Results
The following results were computed with two Intel(R) Xeon(R) CPU E5450
3.00GHz quad core processors and 32 GB RAM main memory.
The threads in figure 4.b have to wait for each other, therefore the speedup
of the row by row fill strategy is much lower than of the diagonal one. Also the
length of the sequence needs to be longer before a speedup can be achieved.
Generally the longer the sequence the higher is the benefit of parallelization.
Figure 5: Overhead – The overhead is any indirect computation time, I/O or any other
resource which is expanded to achieve the goal of the algorithm.
5.a diagonal 5.b row by row
Figure 4: Relative Speedup – The relative speedup shows how much a parallel
programme runs faster on more processors than it does on just one. The gray line at 8
threads refers to the physical number of processors.
4.a diagonal 4.b row by row
Figure 5 shows that parallelism is not for free. Additional overheads are needed
for e.g. creating, starting and stopping threads or waiting for others to finish.
Parallelization of RNA Folding Algorithms
for Multi Core Processors
1Daniel Hooker, 1Michael T. Wolfinger, 2Ivo L. Hofacker
1FH Campus Wien
2Institute for Theoretical Chemistry and Structural Biology, University of Vienna, Austria
hook@tbi.univie.ac.at
The maximum number of base pairs for the whole sequence is
on the upper right of the matrix. Thus there are two different
matrix fill strategies (figure 3) and two different ways of
parallelization.
Figure 2: The Nussinov Algorithm – In case (i) and (ii) we have the
probability that either base ri or rj is unpaired. Case (iii) includes the
probability that ri and rj are paired. The last case (iv) includes the probability
that we have two substructures.
Maximum Matching
According to the “Nussinov Algorithm”[3] the structure with a
maximum number of base pairs is the most stable one
(figure 2).
A single stranded RNA will spontaneously form secondary
structure elements to reduce the free energy before forming
the tertiary structure (figure 1). These structures are formed
by hydrogen bounds between bases.
Figure 1: RNA Structure – This figure shows how the tertiary structure
of a t-RNA is formed from secondary structure elements.
Figure 3: Matrix Fill Strategies – The first matrix shows the diagonal
strategy and the second matrix shows the row by row strategy. The red
colour stands for thread 0 & green signifies thread 1.
3.a diagonal 3.b row by row