PreprintPDF Available

Accelerating Force-Directed Graph Drawing with RT Cores

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Graph drawing with spring embedders employs a V x V computation phase over the graph's vertex set to compute repulsive forces. Here, the efficacy of forces diminishes with distance: a vertex can effectively only influence other vertices in a certain radius around its position. Therefore, the algorithm lends itself to an implementation using search data structures to reduce the runtime complexity. NVIDIA RT cores implement hierarchical tree traversal in hardware. We show how to map the problem of finding graph layouts with force-directed methods to a ray tracing problem that can subsequently be implemented with dedicated ray tracing hardware. With that, we observe speedups of 4x to 13x over a CUDA software implementation.
Content may be subject to copyright.
Accelerating Force-Directed Graph Drawing with RT Cores
Stefan Zellmann*
University of Cologne
Martin Weier
Hochschule Bonn-Rhein-Sieg
Ingo Wald
Figure 1: Drawing a Twitter feed graph (68K vertices, 101K edges) with a force-directed algorithm using RT cores. The images
show the results after N=1 (left), N=100 (second from left), N=2,000 (second from right), and N=12,000 (right) iterations.
We can generate these layouts in 0.003, 0.43, 7.35, and 39.3 seconds, and outperform a typical CUDA software implementation
by 10.2×, 7.44×, 9.6×, and 10.9×, respectively.
Graph drawing with spring embedders employs a V×Vcomputa-
tion phase over the graph’s vertex set to compute repulsive forces.
Here, the efficacy of forces diminishes with distance: a vertex can
effectively only influence other vertices in a certain radius around
its position. Therefore, the algorithm lends itself to an implemen-
tation using search data structures to reduce the runtime complex-
ity. NVIDIA RT cores implement hierarchical tree traversal in hard-
ware. We show how to map the problem of finding graph layouts
with force-directed methods to a ray tracing problem that can subse-
quently be implemented with dedicated ray tracing hardware. With
that, we observe speedups of 4×to 13×over a CUDA software
Index Terms: Human-centered computing—Visualiza-
tion—Visualization techniques—Graph drawings; Computing
methodologies—Computer graphics—Rendering—Ray tracing;
Graph drawing is concerned with finding layouts for graphs and net-
works while adhering to particular aesthetic criteria [7, 32]. These
can, for example, be minimal edge crossings, grouping by con-
nected components or clusters, and obtaining a uniform edge length.
Force-directed algorithms [8, 23] associate forces with the vertices
and edges and iteratively apply those to the layout until equilibrium
is reached and the layout becomes stationary.
Spring embedders, as one representative of force-directed algo-
rithms, iteratively apply repulsive and attractive forces to the graph
layout. The repulsive force computation phase requires O(|V|2)
time over the graph’s vertex set V. This phase can be optimized us-
ing data structures like grids or quadtrees, as the mutually applied
forces effectively only affect vertices within a certain radius.
In this paper, we show how the task of finding all vertices within
a given radius can also be formulated as a ray tracing problem.
This approach does not only create a simpler solution by leaving
the problem of efficient data structure construction to the API, but
also allows for leveraging hardware-accelerated NVIDIA RTX ray
tracing cores (RT cores).
In the following, we provide background and discuss related work
on force-directed graph drawing algorithms. We also give an intro-
duction to NVIDIA RTX and prior work.
2.1 Force-directed graph drawing
We consider graphs G= (V,E)with vertex set Vand edge set
E. Each vVhas a position p(v)R2. Edges eE={u,v},
with u,vV, are undirected and unweighted. The Fruchterman-
Reingold (FR) algorithm [9] (see Alg. 1) calculates the dispersion
to displace each vertex based on the forces. A dampening factor
is used to slow down the forces with an increasing number of it-
erations. Repulsive forces are computed for each pair of vertices
(u,v)V. Attractive forces only affect those pairs that are con-
nected by an edge. The following force functions are used:
Frep (,k) =
Fatt (,k) =
where =p(v)p(u)is the vector between the two vertices acting
forces upon each other. kis computed as pA/|V|, where Ais the
area of the axis-aligned bounding rectangle of V.
As the complexity of the first nested for loop per iteration is
O(|V|2), and by observing that the pairwise forces diminish with
increasing distance between vertices, the authors propose to adapt
the computation of the repulsive force using:
Frep (,k) =
||u(2k− ||),(3)
Algorithm 1 Fruchterman-Reingold spring embedder algorithm.
procedure SPRI NG EMB ED DE R(G(V,E),Iterations,k)
for i := 1 to Iterat ions do
D← |V|dispersion to displace vertices
for all vVdo calculate repulsive forces (V x V)
for all uVdo
D(v):=D(v) + Frep (p(v)p(u),k)
end for
end for
for all eEdo calculate attractive forces
D(v):=D(v)Fatt (p(v)p(u),k)
D(u):=D(u) + Fatt (p(u)p(v),k)
end for
for all vVdo displace vertices according to forces
DIS PL ACE (v,D(v),t)tis a dampening factor
end for
t:=CO OL(t)Decrease dampening factor
end for
end procedure
where u(x)is 1 if x>0 and 0 otherwise. With that, only vertices
inside a radius 2kwill have a non-zero contribution, which in turn
allows for employing acceleration data structures to focus computa-
tions on only vertices within the neighborhood of p(v).
The FR algorithm is a good match for GPUs as the three phases—
repulsive force computation, attractive force computation, and ver-
tex displacement—are highly parallel. The most apparent paral-
lelization described by Klapka and Slaby [25] devotes one GPU
kernel to each phase. The outer dimension of the nested for-loop
over vVis executed in parallel, but each GPU thread runs the
full inner loop over uVin Alg. 1. This reduces the time com-
plexity to Θ(|V|), whereas the work complexity remains Θ(|V|2).
Force-directed algorithms—and in general graph drawing algo-
rithms based on nearest neighbor search—lend themselves well to
massive parallelization on distributed systems [1, 21] or on many-
core systems and GPUs [17, 31, 33].
s et al. [10] accelerate the repulsive force computation
phase by initially sorting the vVon a Morton curve. This or-
der is subdivided into individual blocks to be processed in parallel
in separate CUDA kernels. However, this process is inaccurate, as
forces will only affect vertices from the same block. The authors
try to account for that by randomly jittering vertex positions so that
some of them spill over to neighboring blocks. Mi et al. [29] use
a similar approximation but motivate that by imbalances originat-
ing from the multi-level approach described in [18] that they use in
combination with FR. Our approach does not use approximations
but is equivalent to the FR algorithm using the grid optimization
that was proposed in the original work.
General nearest neighbor queries have been accelerated on the
GPU with k-d trees, as in the work of Hu et al. [22] and by Wehr
and Radkowski [37]. For dense graphs with O(|E|) = O(|V|2), the
attractive force phase can also become a bottleneck. The works by
Brandes and Pich [5] and by Gove [15] propose to choose only a
subset of Eusing sampling to compute the attractive forces. Gove
also suggests using sampling for the graph’s vertex set Vto improve
the complexity of the repulsive force phase [16]. Other modifica-
tions to the stress model exist. The COAST algorithm by Ganser
et al. [12] extends force-directed algorithms to support given, non-
uniform edge lengths. They reformulate the stress function based
on those edge lengths so that it can be solved using semi-definite
programming. The maxent-stress model by Ganser et al. [13] ini-
tially solves the model only for the edge lengths and later resolves
the remaining degrees of freedom via an entropy maximization
model. The repulsive force computation in this work is based on the
classical N-body model by Barnes and Hut [3] and uses a quadtree
data structure for the all-pairs comparison. Hachul and J ¨
unger [20]
gave a survey of force-directed algorithms for large graphs. For a
general overview of force-directed graph drawing algorithms, we
refer the reader to the book chapter by Kobourov [26].
2.2 RTX ray tracing
NVIDIA RTX APIs allow the user to test for intersections of rays
and arbitrary geometric primitives. This technique is often used to
generate raster images. Here, bounding volume hierarchies (BVHs)
help to reduce the complexity of this test, which is otherwise pro-
portional to the number of rays times the number of primitives. The
user supplies a bounds program so that RTX can generate axis-
aligned bounding boxes (AABBs) for the user geometry and build
a BVH. Now, a ray generation program can be executed on the
GPU’s programmable shader cores that will trace rays through the
BVH using an API call. In the intersection program, which is called
when rays hit the AABBs, the user can test for and potentially re-
port an intersection with the geometry. A reported intersection will
then be available in potential closest-hit or any-hit programs. RTX
GPUs perform BVH traversal in hardware. When RTX calls an in-
tersection program, hardware traversal is interrupted and a context
switch occurs that switches execution to the shader cores.
RTX was recently used to accelerate visualization algorithms
like direct volume rendering [30] or glyph rendering [39]. RT cores
have, however, also been used for non-rendering applications, such
as the point location method on tetrahedral elements presented by
Wald et al. [36].
We propose to reformulate the FR algorithm as a ray tracing prob-
lem. That way, we can use an RTX BVH to accelerate the near-
est neighbor query during the repulsive force computation phase.
The queries and data structures used by the two algorithms differ
substantially: force-directed algorithms use spatial subdivision data
structures, whereas RTX uses object subdivision. Nearest neighbor
queries do not directly map to the ray / primitive intersection query
supported by RTX. However, we present a mapping from one ap-
proach to the other and demonstrate its effectiveness using an FR
implementation with the CUDA GPU programming interface.
3.1 Mapping the force-directed graph drawing problem
to a ray tracing problem
We present a high-level overview of our approach in Fig. 2. A near-
est neighbor query can be performed by expanding a circle around
the position p(v)of the vertex vVthat we are interested in and
gathering all uV,u6=vinside that circle. To compute forces, we
would perform that search query for all vVand would integrate
the accumulation of the forces directly into the query.
By observing that the circle we expand around valways has a
radius 2k, we can reverse the problem: instead of expanding a circle
around v, we instead expand circles around all v V. We then trace
an epsilon ray with infinitesimal length and origin at p(v)against
this set of circles and accumulate the forces whenever p(v)is inside
the circle associated with uV, given that u6=v. The intersection
routine of the ray tracer only has to compute the length of the vector
between the ray origin and the center of the circle and report an
intersection whenever that length is less than 2k. Geometrically,
one can think of this as splatting, where the splats whose footprints
overlap p(v)act a repulsive force upon v.
The runtime complexity of the repulsive force computation
phase using nearest neighbor queries can be reduced from Θ(|V|2)
to Θ(|V|log(|V|)) using spatial indices like quadtrees [18] or binary
space partitioning trees [28] built over V. The spatial index would
have to be rebuilt on each iteration. Likewise, the ray tracing query
complexity can be reduced in the same manner using a BVH.
(a) (b) (c) (d)
Figure 2: Mapping nearest neighbor queries to ray tracing queries. (a) The K5: 10 graph; we are interested in the repulsive forces acted upon
the green vertex by all the other vertices. (b) Nearest neighbor queries are performed by gathering the vertices inside a circle around the green
vertex. (c) With a ray tracing query, instead of expanding a circle around the vertex of interest, we expand circles around all vertices. (d) We
trace an epsilon ray (green arrow) originating at the green vertex’ position and with infinitesimal length against the circles’ geometry. Every
circle that overlaps the ray origin—except the circle belonging to the vertex of interest itself—contributes to the force on the green vertex.
3.2 Implementation with CUDA and OptiX 7
We implemented the FR algorithm with CUDA. We use separate
CUDA kernels for the repulsive and attractive forces and for the
vertex dispersion phase. Those kernels are called sequentially in a
loop over all iterations. The dispersion that is computed during the
force phases is stored and updated in a global GPU array.
The parallel attractive force phase uses atomic operations to up-
date the dispersion array. The repulsive phase is implemented us-
ing OptiX 7 and the OptiX Wrapper Library (OWL) [35]. Since
the number of vertices will never change, we use a global, fixed-
size GPU array for the 2-d positions that is shared between CUDA
kernels and OptiX programs. Initial vertex placement is at random
and in a square. RTX does not support 2-d primitives, so that we
construct the BVH from discs with infinitesimal thickness.
The ray generation program spawns one infinitesimal ray per ver-
tex voriginating at p(v); we again account for RTX being a 3-d API
by setting the z coordinates of the ray origin and direction vector to
0 and 1, respectively. In this way, we can directly accumulate the
dispersion inside the intersection program and do not even have to
report an intersection that would otherwise be passed along to a
potential closest-hit or any-hit program.
For a comparison with a fairly optimized, GPU-based nearest neigh-
bor query, we use a 2-d spatial data structure based on the LBVH
algorithm [27, 40]. As the vertices have no area, we obtain a 2-d
BSP tree with axis-aligned split planes that subdivide parent nodes
into two same-sized halves (middle split). With the restriction be-
ing relaxed that two split planes need to be placed at once, we
should outperform the commonly used grid or quadtree implemen-
tations [6,16]. Using Karras’ construction algorithm [24], the build
complexity is O(n)in the number of primitives. Our motivation
to use a data structure with superior construction performance is
that is must be rebuild after each iteration. We use a full traversal
stack in local GPU memory and perform nearest neighbor queries
by gathering all vertices within a 2kradius around the current ver-
tex position at the leaves. We have a slight advantage over RTX as
our data structure is tailored for 2-d. At the same time we note that
we cannot possibly optimize our data structure in the same way that
NVIDIA probably has done with RTX, and neither that this is our
goal with this comparison.
Note that the LBVH and RTX implementations and grid-based
FR result in identical graph layouts. In comparison to state-of-the-
art implementations in graph drawing libraries such as OGDF [6],
Tulip [2], or Gephi [4]—all of which provide sequential CPU im-
plementations of FR—both our RTX and LBVH solutions are mag-
nitudes faster. In order to put both our GPU results into perspective,
we also implemented the naive GPU parallelization from [25] over
just the outer loop of the repulsive force phase.
We report execution times for the four data sets depicted in Ta-
ble 1. Two artificial data sets consist of many fully connected K5: 10
graphs (five vertices, ten edges). In one case we use 5Kof those
and sequentially connect pairs of them with a single edge. In the
second case we use 50Kof them as individual connected compo-
nents. We also test using a complete binary tree with depth 16, as
well as the graph representing twitter feed data that is also depicted
in Fig. 1. For the results reported in Table 1 we used an NVIDIA
GTX 1080 Ti (no RT cores), an RTX 2070, and a Quadro RTX 8000.
The scalability study from Fig. 3 and the evaluation of the repulsive
phase in Table 2 were conducted solely on the Quadro GPU.
Our evaluation suggests speedups of 4×to 13×over LBVH. From
the difference between the mean iteration times in Table 1 and the
mean times for only the repulsive phase in Table 2 we see that the al-
gorithm is dominated by the latter. The other phases plus overhead
account for less than 1 % of the execution time. While Fig. 3 shows
that our method’s performance overhead for small graphs can be
neglected—because it is on the order of about 1 ms—–we observe
dramatic speedups that increase asymptotically with |V|.
Interestingly, we see about the same relative speedups on the
GeForce GTX GPU and on the RTX 2070 GPU with hardware ac-
celeration. At the same time, we observe that the absolute runtimes
differ substantially, which we cannot intuitively explain, as neither
the peak performance in FLOPS, nor the memory performance of
the two GPUs, differ that much. Profiling our handwritten CUDA
nearest neighbor query, we find tree traversal to be limited by L2
cache hit rate, which is about 20 %. For RTX, such an analysis is
impossible and we can only speculate about the results. It is con-
ceivable that the RTX BVH has an optimized memory layout such
as the one by Ylitie et al. [38]. Assuming that we are bound by mem-
ory access latency, the speedups we observe might stem from better
utilization of the GPU’s memory subsystem rather than hardware
acceleration. Switching between hardware and software execution
on RTX GPUs incurs an expensive context switch. Hardware traver-
sal is interrupted whenever the intersection program is called. For
our test data sets, we consistently found the average number of in-
tersection program instances called to be in the hundreds. We might
see an adversarial effect where we, on the one hand, benefit from
hardware acceleration, but on the other hand suffer from expensive
context switches and that the two effects in the end cancel. We find
Table 1: Statistics and average execution times on different GPUs. We use three artificial graphs with different connectivity and edge degrees,
and a twitter feed graph. cCdenote connected components. Execution times reported are per full iteration including all phases.
5K×K5: 10 (connected) Twitter Binary Tree (Depth=16) 50K×K5: 10 (unconnected)
|V|: 25K,|E|: 69K,|C|: 1 |V|: 68K,|E|: 101K,|C|: 3K|V|: 131K,|E|: 131K,|C|: 1 |V|: 250K,|E|: 500K,|C|: 50K
Min./max./Vert. Degree: 4/8/6Min./max./Vert. Degree: 1/810/3 Min./max./Vert. Degree: 1/3/2 Min./max./Vert. Degree: 4/4/4
Min./max./Vert’s / c: 25K(all) Min./max./Vert’s / c: 2/44K/20 Min./max./Vert’s / c: 131K(all) Min./max./Vert’s / c: 5 (all)
RTX 8000
RTX 2070
GTX 1080Ti
t = 14.78 t= 10.99
t= 2.566
t = 16.78 t= 12.65
t= 2.969
t = 24.32
t= 17.36
t= 3.836
t = 49.73
t= 24.44
t= 5.523
t= 33.81
t= 7.958
= 104.0
t = 191.4
t = 97.23
t= 9.486
t = 189.2
t= 65.86
t= 5.896
t = 380.4
t= 117.1
t= 9.683
t = 612.6
t= 178.8
t= 13.83
t = 710.3
t= 88.33 t= 6.826
t = 1294
t= 139.6 t= 12.79
t = 2236
t= 204.8 t= 21.96
Table 2: Acceleration data structure statistics on RTX 8000, for the
repulsive force computation phases. Execution times per iteration
are given in milliseconds and the ratio of build vs. traversal times
in percent. We also report total BVH memory consumption in MB.
Data Set Mode Mem Build Traversal ΣFrep Speedup
5K×K5: 10 LBVH 1.53 0.92 (8.37%) 10.0 (91.6%) 10.9
(connected) RTX 1.18 1.16 (45.5%) 1.39 (54.5%) 2.55 4.27×
Twitter LBVH 4.16 1.94 (7.94%) 22.5 (92.1%) 24.4
RTX 3.22 2.18 (39.7%) 3.31 (60.3%) 5.49 4.44×
Binary Tree LBVH 8.00 2.53 (3.84%) 63.3 (96.2%) 65.8
(Depth=16) RTX 6.19 2.36 (40.3%) 3.50 (59.7%) 5.87 11.2×
50K×K5: 10 LBVH 15.3 2.87 (3.26%) 85.4 (96.7%) 88.3
(unconnected) RTX 11.8 2.82 (41.6%) 3.95 (58.4%) 6.77 13.0×
the speedups that we observe reassuring, especially because using
RTX lifts the burden of having to program an optimized tree traver-
sal algorithm for the GPU from the user.
We acknowledge that force-directed methods for large graphs ex-
ist that require fewer iterations to arrive at a converged layout and
outperform FR by far in this regard [20] and are often based on
multilevel optimizations [34]. We chose FR as a most simple force-
directed algorithm to reason about the speedup and practicability of
our approach. Algorithms that perform a nearest neighbor search to
compute forces will generally benefit from the proposed techniques.
The Fast Multipole Multilevel Method (FM3) [19] employs such a
nearest neighbor search and uses a coarsening phase in-between
iterations. Similar to our method, the GPU multipole algorithm
by Godiyal et al. [14] employs a k-d tree that is rebuilt per itera-
tion, uses stackless traversal, and would likely benefit from RTX.
4 6 8 10 12 14 16 18
Binary Tree Depth
Mean Time (ms)
46 8 10 12 14 16 18
Binary Tree Depth
RTX (Ours)
Figure 3: Scalability study where we build complete binary trees
with depth D=4,5,...,18. Left: linear scale, right: logarithmic
scale. We report mean times for only the repulsive force phase.
The GRIP method by Gajer and Kobourov [11] employs a refine-
ment phase that uses FR to compute local displacement vectors.
Although we assume that our approach will complement state-of-
the-art algorithms with better convergence rates, a thorough com-
parison is outside of this paper’s scope and presents a compelling
direction for future work.
We presented a GPU-based optimization to the force-directed
Fruchterman-Reingold graph drawing algorithm by mapping the
nearest neighbor query performed during the repulsive force com-
putation phase to a ray tracing problem that can be solved with
RT core hardware. The speedup over a nearest neighbor query
with a state-of-the-art data structure that we observe is encourag-
ing. Force-directed algorithms lend themselves to a parallelization
with GPUs. We found that those algorithms can be optimized even
further by using RT cores and hope that our work raises awareness
for this hardware feature even outside the typical graphics and ren-
dering communities.
[1] A. Arleo, W. Didimo, G. Liotta, and F. Montecchiani. A distributed
multilevel force-directed algorithm. IEEE Transactions on Parallel
and Distributed Systems, 30(4):754–765, Apr. 2019. doi: 10.1109/
tpds.2018. 2869805
[2] D. Auber. Tulip - a huge graph visualization framework. In M. J ¨
and P. Mutzel, eds., Graph Drawing Software, pp. 105–126. Springer,
[3] J. E. Barnes and P. Hut. A hierarchical O(n-log-n) force calculation
algorithm. Nature, 324:446, 1986.
[4] M. Bastian, S. Heymann, and M. Jacomy. Gephi: An open source
software for exploring and manipulating networks, 2009.
[5] U. Brandes and C. Pich. Eigensolver methods for progressive multi-
dimensional scaling of large data. In M. Kaufmann and D. Wagner,
eds., Graph Drawing, pp. 42–53. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2007.
[6] M. Chimani, C. Gutwenger, M. J ¨
unger, G. W. Klau, K. Klein, and
P. Mutzel. The Open Graph Drawing Framework (OGDF). In
R. Tamassia, ed., Handbook of Graph Drawing and Visualization,
chap. 15, pp. 543–569. CRC Press, Oxford, 2014.
[7] G. Di Battista. Graph drawing: the aesthetics-complexity trade-off. In
K. Inderfurth, G. Schw ¨
odiauer, W. Domschke, F. Juhnke, P. Klein-
schmidt, and G. W¨
ascher, eds., Operations Research Proceedings
1999, pp. 92–94. Springer Berlin Heidelberg, 2000.
[8] P. Eades. A heuristic for graph drawing. Congressus Numerantium,
42:149–160, 1984.
[9] T. M. J. Fruchterman and E. M. Reingold. Graph drawing by force-
directed placement. Software: Practice and Experience, 21(11):1129–
1164, 1991. doi: 10. 1002/spe.4380211102
[10] P. Gajdoˇ
s, T. Jeˇ
zowicz, V. Uher, and P. Dohn´
alek. A parallel
Fruchterman-Reingold algorithm optimized for fast visualization of
large graphs and swarms of data. Swarm and Evolutionary Computa-
tion, 26:56 – 63, 2016. doi: 10. 1016/j.swevo.2015.07.006
[11] P. Gajer and S. G. Kobourov. Grip: Graph drawing with intelligent
placement. In J. Marks, ed., Graph Drawing, pp. 222–228. Springer
Berlin Heidelberg, Berlin, Heidelberg, 2001.
[12] E. R. Gansner, Y. Hu, and S. Krishnan. COAST: A convex optimiza-
tion approach to stress-based embedding. In S. Wismath and A. Wolff,
eds., Graph Drawing, pp. 268–279. Springer International Publishing,
[13] E. R. Gansner, Y. Hu, and S. North. A maxent-stress model for graph
layout. IEEE Transactions on Visualization and Computer Graphics,
19(6):927–940, 2013.
[14] A. Godiyal, J. Hoberock, M. Garland, and J. C. Hart. Rapid multi-
pole graph drawing on the gpu. In I. G. Tollis and M. Patrignani, eds.,
Graph Drawing, pp. 90–101. Springer Berlin Heidelberg, Berlin, Hei-
delberg, 2009.
[15] R. Gove. Force-directed graph layouts by edge sampling. In
2019 IEEE 9th Symposium on Large Data Analysis and Visualization
(LDAV), pp. 1–5, 2019.
[16] R. Gove. A random sampling O(n) force-calculation algorithm for
graph layouts. Computer Graphics Forum, 38(3):739–751, 2019. doi:
10.1111/cgf. 13724
[17] N. A. Gumerov and R. Duraiswami. Fast multipole methods on graph-
ics processors. Journal of Computational Physics, 227(18):8290 –
8313, 2008. doi: 10. 1016/ 2008.05.023
[18] S. Hachul and M. J ¨
unger. Drawing large graphs with a potential-field-
based multilevel algorithm. In J. Pach, ed., Graph Drawing, pp. 285–
295. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.
[19] S. Hachul and M. J ¨
unger. Large-graph layout with the fast multipole
multilevel method. Technical report, Zentrum f¨
ur Angewandte Infor-
matik K¨
oln, 2005.
[20] S. Hachul and M. J ¨
unger. Large-graph layout algorithms at work: An
experimental study. Journal of Graph Algorithms and Applications,
11(2):345–369, 2007.
[21] A. Hinge, G. Richer, and D. Auber. Mugdad: Multilevel graph draw-
ing algorithm in a distributed architecture. In Conference on Computer
Graphics, Visualization and Computer Vision, p. 189. IADIS, Lisbon,
Portugal, 2017.
[22] L. Hu, S. Nooshabadi, and M. Ahmadi. Massively parallel kd-tree
construction and nearest neighbor search algorithms. In 2015 IEEE
International Symposium on Circuits and Systems (ISCAS), pp. 2752–
2755, 2015.
[23] T. Kamada and S. Kawai. An algorithm for drawing general undi-
rected graphs. Information Processing Letters, 31(1):7 – 15, 1989.
doi: 10.1016/0020-0190(89)90102-6
[24] T. Karras. Maximizing parallelism in the construction of BVHs, oc-
trees, and k-d trees. In Proceedings of the Fourth ACM SIGGRAPH
/ Eurographics Conference on High-Performance Graphics, EGGH-
HPG’12, pp. 33–37. Eurographics Association, Goslar Germany, Ger-
many, 2012. doi: 10.2312/EGGH/HPG12/033-037
[25] O. Klapka and A. Slaby. nVidia CUDA platform in graph visualiza-
tion. In S. Kunifuji, G. A. Papadopoulos, A. M. Skulimowski, and
J. Kacprzyk ˆ
A , eds., Knowledge, Information and Creativity Sup-
port Systems, pp. 511–520. Springer International Publishing, 2016.
[26] S. G. Kobourov. Force-directed drawing algorithms. In R. Tamas-
sia, ed., Handbook of Graph Drawing and Visualization, chap. 12, pp.
383–408. CRC Press, Oxford, 2014.
[27] C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, and D. Manocha.
Fast BVH construction on GPUs. Computer Graphics Forum, 2009.
doi: 10.1111/j. 1467-8659.2009. 01377.x
[28] U. Lauther. Multipole-based force approximation revisited – a simple
but fast implementation using a dynamized enclosing-circle-enhanced
k-d-tree. In M. Kaufmann and D. Wagner, eds., Graph Drawing, pp.
20–29. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
[29] P. Mi, M. Sun, M. Masiane, Y. Cao, and C. North. Interactive graph
layout of a million nodes. Informatics, 3(4):23, 2016.
[30] N. Morrical, W. Usher, I. Wald, and V. Pascucci. Efficient space skip-
ping and adaptive sampling of unstructured volumes using hardware
accelerated ray tracing. In 2019 IEEE Visualization Conference (VIS),
pp. 256–260, Oct 2019. doi: 10. 1109/VISUAL.2019.8933539
[31] A. Panagiotidis, G. Reina, M. Burch, T. Pfannkuch, and T. Ertl. Con-
sistently gpu-accelerated graph visualization. In Proceedings of the
8th International Symposium on Visual Information Communication
and Interaction, VINCI ’15, p. 35–41. Association for Computing Ma-
chinery, New York, NY, USA, 2015. doi: 10.1145/2801040.2801053
[32] H. C. Purchase. Metrics for graph drawing aesthetics. Journal of
Visual Languages & Computing, 13(5):501 – 516, 2002. doi: 10.1006/
jvlc.2002. 0232
[33] V. Uher, P. Gajdo, and V. Sn´
ael. The visualization of large graphs
accelerated by the parallel nearest neighbors algorithm. In 2016 IEEE
Second International Conference on Multimedia Big Data (BigMM),
pp. 9–16, 2016.
[34] A. Valejo, V. Ferreira, R. Fabbri, M. C. F. d. Oliveira, and A. d. A.
Lopes. A critical survey of the multilevel method in complex networks.
ACM Comput. Surv., 53(2), Apr. 2020. doi: 10. 1145/3379347
[35] I. Wald, N. Morrical, and E. Haines. OWL – The Optix 7 Wrapper
Library, 2020.
[36] I. Wald, W. Usher, N. Morrical, L. Lediaev, and V. Pascucci. RTX
Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing
Cores for Tet-Mesh Point Location. In M. Steinberger and T. Foley,
eds., High-Performance Graphics - Short Papers. The Eurographics
Association, 2019. doi: 10. 2312/hpg.20191189
[37] D. Wehr and R. Radkowski. Parallel kd-tree construction on the GPU
with an adaptive split and sort strategy. Int. J. Parallel Program.,
46(6):1139–1156, Dec. 2018. doi: 10. 1007/s10766-018-0571-0
[38] H. Ylitie, T. Karras, and S. Laine. Efficient Incoherent Ray Traver-
sal on GPUs Through Compressed Wide BVHs. In V. Havran and
K. Vaiyanathan, eds., Eurographics/ ACM SIGGRAPH Symposium
on High Performance Graphics. ACM, 2017. doi: 10.1145/3105762.
[39] S. Zellmann, M. Aum ¨
uller, N. Marshak, and I. Wald. High-Quality
Rendering of Glyphs Using Hardware-Accelerated Ray Tracing. In
S. Frey, J. Huang, and F. Sadlo, eds., Eurographics Symposium on
Parallel Graphics and Visualization. The Eurographics Association,
2020. doi: 10. 2312/pgv.20201076
[40] S. Zellmann, M. Hellmann, and U. Lang. A linear time BVH construc-
tion algorithm for sparse volumes. In Proceedings of the 12th IEEE
Pacific Visualization Symposium. IEEE, 2019.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
While fast spatial index construction for triangle meshes has gained a lot of attention from the research community in recent years, fast tree construction algorithms for volume data are still rare and usually do not focus on real-time processing. We propose a linear time bounding volume hierarchy construction algorithm based on a popular method for surface ray tracing of triangle meshes that we adapt for direct volume rendering with sparse volumes. We aim at interactive to real-time construction rates and evaluate our algorithm using a GPU implementation.
Full-text available
We introduce a parallel kd-tree construction method for 3-dimensional points on a GPU which employs a sorting algorithm that maintains high parallelism throughout construction. Typically, large arrays in the upper levels of a kd-tree do not yield high performance when computing each node in one thread. Conversely, small arrays in the lower levels of the tree do not benefit from typical parallel sorts. To address these issues, the proposed sorting approach uses a modified parallel sort on the upper levels before switching to basic parallelization on the lower levels. Our work focuses on 3D point registration and our results indicate that a speed gain by a factor of 100 can be achieved in comparison to a naive parallel algorithm for a typical scene. © 2018 Springer Science+Business Media, LLC, part of Springer Nature
Full-text available
Sensemaking of large graphs, specifically those with millions of nodes, is a crucial task 1 in many fields. Automatic graph layout algorithms, augmented with real-time human-in-the-loop 2 interaction, can potentially support sensemaking of large graphs. However, designing interactive 3 algorithms to achieve this is challenging. In this paper, we tackle the scalability problem of 4 interactive layout of large graphs, and contribute a new GPU-based force-directed layout algorithm 5 that exploits graph topology. This algorithm can interactively layout graphs with millions of 6 nodes, and support real-time interaction to explore alternative graph layouts. Users can directly 7 manipulate the layout of vertices in a force-directed fashion. The complexity of traditional 8 repulsive force computation is reduced by approximating calculations based on the hierarchical 9 structure of multi-level clustered graphs. We evaluate the algorithm performance, and demonstrate 10 human-in-the-loop layout in two sensemaking case studies. Moreover, we summarize lessons 11 learned for designing interactive large graph layout algorithms on the GPU. 12
Multilevel optimization aims at reducing the cost of executing a target network-based algorithm by exploiting coarsened, i.e., reduced or simplified, versions of the network. There is a growing interest in multilevel algorithms in networked systems, mostly motivated by the urge for solutions capable of handling large-scale networks. Notwithstanding the success of multilevel optimization in a multitude of application problems, we were unable to find a representative survey of the state-of-the-art, or consistent descriptions of the method as a general theoretical framework independent of a specific application domain. In this article, we strive to fill this gap, presenting an extensive survey of the literature that contemplates a systematic overview of the state-of-the-art, a panorama of the historical evolution and current challenges, and a formal theoretical framework of the multilevel optimization method in complex networks. We believe our survey provides a useful resource to individuals interested in learning about multilevel strategies, as well as to those engaged in advancing theoretical and practical aspects of the method or in developing solutions in novel application domains.
This paper proposes a linear‐time repulsive‐force‐calculation algorithm with sub‐linear auxiliary space requirements, achieving an asymptotic improvement over the Barnes‐Hut and Fast Multipole Method force‐calculation algorithms. The algorithm, named random vertex sampling (RVS), achieves its speed by updating a random sample of vertices at each iteration, each with a random sample of repulsive forces. This paper also proposes a combination algorithm that uses RVS to derive an initial layout and then applies Barnes‐Hut to refine the layout. An evaluation of RVS and the combination algorithm compares their speed and quality on 109 graphs against a Barnes‐Hut layout algorithm. The RVS algorithm performs up to 6.1 times faster on the tested graphs while maintaining comparable layout quality. The combination algorithm also performs faster than Barnes‐Hut, but produces layouts that are more symmetric than using RVS alone. Data and code:
The use of graph visualization approaches to present and analyze complex data is taking a leading role in conveying information and knowledge to users in many application domains. This creates the need of developing efficient and effective algorithms that automatically compute graph layouts. In this respect, force-directed algorithms are arguably among the most popular graph layout techniques. Aimed at leveraging the potential of modern distributed graph algorithms platforms, we present Multi-GiLA , the first multilevel force-directed graph visualization algorithm based on a vertex-centric computation paradigm. We implemented Multi-GiLA using the Apache Giraph platform. Experiments show that it can be successfully applied to compute high quality layouts of very large graphs on inexpensive cloud computing platforms.
Conference Paper
We present a GPU-based ray traversal algorithm that operates on compressed wide BVHs and maintains the traversal stack in a compressed format. Our method reduces the amount of memory traffic significantly, which translates to 1.9--2.1× improvement in incoherent ray traversal performance compared to the current state of the art. Furthermore, the memory consumption of our hierarchy is 35--60% of a typical uncompressed BVH. In addition, we present an algorithmically efficient method for converting a binary BVH into a wide BVH in a SAH-optimal fashion, and an improved method for ordering the child nodes at build time for the purposes of octant-aware fixed-order traversal.
Many today’s practical problems, e.g. bioinformatics, data mining or social networks can be visualized and better examined and understood in the form of a graph. Elaborating big graphs, however, requires high computing power. The performance of CPUs is not sufficient for this purpose but graphics processing unit (GPU) may serve as a suitable high performance, well optimized and low cost platform for calculations of this kind. The article deals with the Fruchterman-Reingold graph and brings solution to this problem; how its layout algorithm can be parallelized for the GPU using nVidia CUDA computing model. This article is continuation and extension of (Klapka and Slaby, The 9th international conference on knowledge, information and creativity support systems, 2014) [8] and gives some other facts and details.