ThesisPDF Available

Caching Shading Information in World Space using Implicit Progressive Low Discrepancy Point Sets

Authors:

Abstract and Figures

To create realistic computer-generated images, modern renderers often rely on Monte Carlo integration to approximate the light transportation. While offline rendering systems can execute multiple samples per pixel to better estimate the outcome of the light transport simulation, real-time systems are usually limited to one sample per pixel, per frame, to maintain interactive framerates. Instead, real-time renderers accumulate the results of multiple frames, by caching the intermediate results, to produce a similar image. To be able to cache the intermediate results, a discrete representation of the continuous surfaces in the environment is required. This discretization is often performed using fixed offsets, which leads to visually intrusive discretization artifacts. To reduce these artifacts, our approach instead relies on a runtime discretization using an implicit progressive low discrepancy point set. Compared to other point set techniques, the implicitness of our method minimizes the required memory footprint, it avoids expensive memory lookups, and it allows for an adaptive subdivision scheme which guides the density of the implicit point set. To maximize the effectiveness, our method reconstructs the final image by filtering the cached shading information directly in world space. We evaluate the method by reviewing the quality of the reconstructed image, the memory usage of the data structure used to cache the shading information and by analysing the performance. Supplementary material can be found here: http://matthieudelaere.com/project_caching.html
Content may be subject to copyright.
MASTER THESIS
Caching Shading Information in World Space using
Implicit Progressive Low Discrepancy Point Sets
Author:
Matthieu Delaere
Supervisor:
Dr. ing. Jacco Bikker
This thesis is submitted in fulfilment of the requirements
for the degree of Master of Science in Game Technology
In the
International Games Architecture and Design
Academy for Digital Entertainment
05-08-2021
1
Declaration of Authorship
I, Matthieu Delaere, declare that this thesis titled, Caching Shading Information in
World Space using Implicit Progressive Low Discrepancy Point Sets and the work
presented are my own. I confirm that:
This work was done wholly or mainly while in candidature for a research degree at this
University.
Where any part of this thesis has previously been submitted for a degree or any other
qualification at this University or any other institution, this has been clearly stated.
Where I have consulted the published work of others, this is always clearly attributed.
Where I have quoted from the work of others, the source is always given. With the
exception of such quotations, this thesis is entirely my own work.
I have acknowledged all main sources of help.
Where the thesis is based on work done by myself jointly with others, I have made clear
exactly what was done by others and what I have contributed myself.
Signed: Matthieu Delaere -
Date: 05-08-2021
2
3
While computer graphics have been an area of interest for several years, the master
program improved my fundamental knowledge of computer graphics. Researching the
topic presented in this thesis, and writing the thesis itself, stimulated me in being more
critical while keeping an open mind. A life changing experience which is extremely
valuable.
Matthieu Delaere
4
5
Breda University of Applied Sciences
Abstract
International Games Architecture and Design
Academy for Digital Entertainment
Master of Game Technology
Caching Shading Information in World Space using Implicit Progressive Low
Discrepancy Point Sets
By Matthieu Delaere
To create realistic computer-generated images, modern renderers often rely on Monte
Carlo integration to approximate the light transportation. While offline rendering systems
can execute multiple samples per pixel to better estimate the outcome of the light
transport simulation, real-time systems are usually limited to one sample per pixel, per
frame, to maintain interactive framerates. Instead, real-time renderers accumulate the
results of multiple frames, by caching the intermediate results, to produce a similar image.
To be able to cache the intermediate results, a discrete representation of the continuous
surfaces in the environment is required. This discretization is often performed using fixed
offsets, which leads to visually intrusive discretization artifacts. To reduce these artifacts,
our approach instead relies on a runtime discretization using an implicit progressive low
discrepancy point set. Compared to other point set techniques, the implicitness of our
method minimizes the required memory footprint, it avoids expensive memory lookups,
and it allows for an adaptive subdivision scheme which guides the density of the implicit
point set. To maximize the effectiveness, our method reconstructs the final image by
filtering the cached shading information directly in world space. We evaluate the method
by reviewing the quality of the reconstructed image, the memory usage of the data
structure used to cache the shading information and by analysing the performance.
6
7
Acknowledgements
I would like to express my deep and sincere gratitude to my supervisor Dr. ing. Jacco
Bikker for his guidance and valuable insights. Beside my supervisor, I would also like to
thank Thomas Buijtenweg, coordinator of the MGT program at BUAS, and the rest of the
MGT staff for creating this educational program and for taking care of me, and my fellow
students, throughout this exceptional year.
All of this would not have been possible without the support of my parents. I would like
to thank them from the bottom of my heart for supporting me all those years, both
financially and emotionally, and for giving me all the opportunities that made me into the
person I am today.
Finally, I would like to thank my girlfriend, and my closest friends, for supporting me
during the difficult times. Their support has been invaluable and precious.
8
9
Contents
Declaration of Authorship 1
Abstract 5
Acknowledgements 7
Chapter 1: Introduction 17
Chapter 2: Related Work 19
Chapter 3: Algorithm Overview 24
3.1 Implicit Point Generation 26
3.2 Updating Data Structure 30
3.2.1 Accumulating Shading Information 30
3.2.1 Merging Data Structures 31
3.3 Visualizing Cached Information 32
Chapter 4: Results 34
4.1 Comparison with Existing Work 34
4.2 Image Quality 36
4.3 Memory Usage and Performance 42
Chapter 5: Discussion and Limitations 43
5.1 Image Quality 43
5.2 Memory Usage and Performance 46
5.3 Limitations 48
Chapter 6: Conclusion and Future Work 49
6.1 Conclusion 49
6.2 Future Work 50
Appendix A 52
Bibliography 53
10
11
List of Figures
Chapter 3
1. Overview of passes used in algorithm – 25
2. Discrete position within voxel - 26
3. Fixed point representation - 26
4. Quadrant based indexing in voxel 27
5. 2D representations of shared points across levels – 28
6. Implicit point generation algorithm pseudocode 28
7. Visualization closest points using barycentric coordinate-based probability - 29
Chapter 4
8. Difference between discretization methods - 35
9. Difference between jittering and random selection 35
10. Reconstruction using modified Shepard interpolation 36
11. Renders of Bistro Interior scene 37
12. Renders of Bistro Exterior scene 38
13. Renders of Chinese Dragon scene 39
14. Renders of Hairball scene 40
15. SSIM over frames using different LODs 41
16. Ground truth images from Amazon’s Lumberyard Interior & Exterior 41
Chapter 5
17. Artifacts after modified Shepard interpolation 44
Chapter 6
18. Minmax-map approach to estimate local shading complexity - 51
List of Tables
Chapter 4
1. Collected metrics 36
2. The number of unique elements occupying the persistent hash table 42
3. Render times 42
Chapter 5
4. Difference between metrics from reconstructed images - 43
12
13
List of Abbreviations
GI Global Illumination
MSE Mean-Square Error
PSNR Peak Signal-to-Noise Ratio
SSIM Structural Similarity Index
LOD Level Of Detail
VXGI Voxel Global Illumination
MIP Multum In Parvo (“much in little)
SVGF Spatiotemporal Variance-Guided Filtering
AO Ambient Occlusion
2D 2-Dimensional
3D 3-Dimensional
PCG Permuted Congruential Generator
API Application Programming Interface
GPU Graphics Processing Unit
Mb Megabyte
Ms Milliseconds
RTAO Ray Traced Ambient Occlusion
14
15
Dedicated to my parents and my life partner.
16
17
Chapter 1: Introduction
The past two decades has seen rapid developments in increasing the computational
power of graphics hardware. With these advancements came the possibility to improve
the simulation of virtual environments in real-time interactive applications, such as video
games. One of the improvements in simulation can be found in light transport simulation.
Having the ability to simulate the behaviour of real-world lighting, allows the generation
of realistic looking computer images. While photorealism might not be desirable for all
applications, many video games aim for realistic computer graphics as it might help
increasing the immersion. Although light transport for offline applications has been
thoroughly researched in the past years, translating the simulation algorithms to real-
time applications can be challenging due to their complexity.
Creating photorealistic images involves simulating global illumination, which consists of
both direct and indirect illumination. Direct illumination replicates the interaction of light
with surfaces that are directly visible from a light source. When light interacts with a
surface, it can bounce off and illuminate other surfaces indirectly. This is called indirect
illumination. When realistic computer graphics are desirable, the indirect lighting term
cannot be dropped as this would result in dull and artificial looking images. While the
direct illumination term in light transport simulation has been successfully translated to
real-time applications, indirect illumination remains challenging.
It is necessary for further research to explore different techniques that support indirect
illumination, and other indirect shading techniques such as ambient occlusion, in real-
time applications. To maintain interactive framerates there is only a limited number of
computations that can be done each frame. Instinctively one could reuse information from
previous frames and from similar locations in the 3D environment. In order to reuse
information from different locations over time, a simplified representation of the
continuous surfaces of the 3D environment is required. A common approach is to
discretize the scene into a finite number of elements. Within a given frame, the indirect
illumination is calculated for each element, using a fixed number of directions. The
intermediate results of these calculations are accumulated over several frames to
approximate the final indirect illumination. Most of the current real-time techniques rely
on this spatial discretization and temporal accumulation. Although these techniques do
improve the overall image quality, they often introduce error due to the regular uniform
grid subdivision schemes used to discretize the environment. The effect of this
18
discretization error can be perceived as aliasing. Aliasing artifacts are generally
conspicuous to the human eye as they can display obvious patterns and shapes [1]. These
artifacts can create unpleasing looking results and should thus be avoided as much as
possible.
While discretization error as a result of the simplified representation cannot be
completely avoided, it can be reduced by instead using an irregular subdivision scheme.
Irregularity can be produced using stochastic patterns such as Poisson, Poisson Disk and
Uniform Jittering [2], [3]. Although these patterns have been successfully used to generate
point sets, where each point is used to store shading information such as indirect
illumination [4], they are often generated in a pre-process step [5]. Due to the dynamic
and interactive nature of videogames, using precalculated point sets can be impractical.
Ideally point sets react at runtime to changes in the environment, scale according to the
size of the 3D scene, support level of detail to control the density of the point set in areas
where more detail is desirable and limit the use of computer memory.
In order to minimize aliasing artifacts, this thesis attempts to translate the point set
approach for caching shading information to real-time applications, by answering the
following research questions:
How to implicitly create low discrepancy 3D point sets progressively at
runtime while avoiding expensive memory lookups?
How to efficiently query the point set to find the closest neighbour to be able
to store the shading information near the closest primary hit?
What data structure is most desirable to store the shading information in a
massive parallel environment?
How to support adaptive level of detail to control the density of the point set?
We evaluate the proposed method by comparing image quality using the mean-square
error (MSE), peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), by
measuring the rendering time and by analysing the memory requirements.
19
Chapter 2: Related Work
Many years of research into photorealistic computer graphics led to a good understanding
of light transport simulation. While it is still an active research field, the current
technology allows computers to generate almost photorealistic images. Creating these
images using light transport algorithms still require a significant amount of computation
time. Nonetheless, after extended research, some of the techniques have found their way
into real-time application, such as video games. While realism in video games can enhance
the immersion and thus the player experience, it is important to be aware of potential
negative effects [6][7]. The technology might also be used for other, illegal, purposes [8].
While being aware of these potential implications, the video game industry is still
researching how to translate the offline light transport algorithms to real-time with a
focus on positive immersion.
The creation of photorealistic computer-generated images is often based on the rendering
equation introduced by Kajiya [9]. While this integral equation describes light transport
in a general form, the equation itself is unsolvable because it is infinitely recursive. Even
though the equation cannot be solved analytically, the light transport simulation can be
approximated using Monte Carlo integration techniques. Monte Carlo integration can
produce realistic images but it is often computational expensive [10]. The image quality
depends on the number of samples taken and the distribution of those samples. While
numerous studies improved the convergence rates of Monte Carlo methods, by using
techniques such as importance sampling and next event estimation, multiple samples are
still required to achieve a good estimation. To maintain interactive framerates in real-
time applications, it is impossible to take multiple samples per pixel within a frame to
estimate the indirect illumination of a point using current generation hardware.
Ever since the introduction of the irradiance cache by Ward et al. [11], a lot of research
has gone into studying different caching schemes. Caching of shading information, such
as indirect illumination, allows the real-time application to spread the calculations of the
integration over several frames by reusing the calculations from previous frames. For
many years, a similar caching technique has been successfully used in real-time
applications in the form of lightmaps.
20
Lightmaps are 2D textures where each element, called lumel, is used to store shading
information. This shading information could be a simple intermediate representation of
irradiance using a RGB value, but it is also possible to store more complex structures such
as a set of Spherical Harmonics [12] or Spherical Gaussians. Lightmapping is thus a non-
volumetric discretization technique that relies on the surface parameterization of the 3D
objects in the scene. Each object is required to have a unique set of UV coordinates. While
this technique has proven its usefulness in video games, it is often limited to low-
frequency effects such as indirect illumination due to the limited resolution of the
lightmap. Most applications using lightmaps only support static indirect illumination,
which is calculated beforehand in a pre-process step. In recent research, Electronic Arts
(EA) [13] investigated a technique using multiple texture caches and hardware ray tracing
to accelerate the generation of lightmaps. Even though the results are promising, the
technique is merely used to help artists with authoring the 3D environment and the final
lightmaps are still baked in a pre-process step. While texture space techniques are mainly
used for static shading information, recent developments show that they can also be used
for storing dynamic shading information, both in a rasterization and a ray tracing pipeline
[14][16]. However, they still rely on surface parameterization based on UV coordinates,
which is often done ahead of time, either manually by an artist or by using an automatic
unwrapping tool.
Voxel techniques avoid the requirement of 2D surface parameterization. Most voxel
techniques discretize the environment based on the 3D information of the objects in the
scene. The 3D information typically includes, but is not limited to, the vertices that
represent the objects and the transformation matrices of those objects. The voxelization
process of static objects is typically performed as a pre-process step. Dynamic objects can
be supported in different ways. Nvidia’s VXGI (Voxel Global Illumination)[17] creates the
voxel structures of dynamic objects at the beginning of each frame, while other techniques
might pre-calculate them and use the transformation matrices to determine the locations
in the scene instead. The voxel representation itself can be stored in different types of
data structures such as a regular grid, a sparse octree or a clipmap [18]. If the structure is
a hierarchical data structure, and the voxelization process is using a multi-resolution
voxelization method, different levels of detail (LOD) can be supported. Once the voxel
representation has been determined, shading information is injected into the voxel
themselves. When multiple resolutions are used, the information is typically stored in the
finest level and pre-filtered into the coarser levels. During rendering, a voxel cone tracing
method [19] can be used to gather the information from the voxels.
21
Although voxel techniques can be categorized as volumetric discretization techniques,
they usually only store information in voxels that relate to surfaces. Greger et al. [20]
extended the idea of the irradiance cache and introduced the irradiance volume. The
volume is represented as a non-uniform subdivided 3D grid with a varying number of
cells. Each cell represents a discretized location in 3D space. The subdivision of the grid is
generally also influenced by the geometry in the scene, but it could be based on other
parameters such as Signed Distance Fields [21]. As common with caching techniques,
different types of information can be stored in the cells. The volumetric approach is often
presented in conjunction with light probes. Light probes are commonly expressed as a set
of Spherical Harmonics or as a cubemap with different mip levels. McGuire et al. [22]
presented a new, though similar structure, which holds light field probes. It is important
to point out that this technique requires an additional discretization based on directions
as each light field probe also encodes geometric information for visibility-aware querying.
Binder et al. [23] introduced a different 3D discretization technique that relies on the
hashing of quantized descriptors. For every 3D point that needs to be shaded, a quantized
representation is created. This quantized position is stored in a descriptor. These
descriptors can also contain other pieces of information, such as the normal of the surface
the point resides on or directions for glossy reflections [24]. A descriptor, with all its data,
is hashed and the resulting index is used to store shading information in a hash map.
Different levels of detail can be supported by considering a level of detail factor during
the hashing. The 3D discretization thus happens implicitly and does not require any pre-
processing step. Just as the other methods, this approach still suffers from aliasing due to
the uniform discretization.
Most techniques counter the aliasing by filtering the data at runtime using a spatial
reconstruction filter. This spatial filter considers the shading information of neighbouring
elements. Although it helps smoothing out the discretization error, it does not fully
remove the aliasing artifacts. Other issues that may arise with spatial filters are over
blurring and light leaking. The spatial filter can also be applied in different spaces. Keller
et al. [25] introduced a technique to filter data in path space. Binder et al. [23] extended
this method and introduced a jittered access pattern to reduce aliasing. Schied et al. [26]
introduced a technique called Spatiotemporal Variance-Guided Filtering (SVGF) which
filters data in screen space. Depending on the filtering technique, other operations, such
as edge-stopping functions, might be necessary to avoid the interpolation of unrelated
shading information.
22
Because of the accumulation of shading information over time, an invalidation technique
is required when dynamic changes in the environment or image are detected. Most
methods rely on a temporal filter to validate information, allowing for information from
previous frames that pass the validity tests to be reused [27]. While there are different
temporal filter techniques available, most of them rely on temporally averaging shading
information using weights that are based on exponential moving averages. Schied et al.
[28] proposed a method to replace the constant accumulation factor from exponential
moving averages with an adaptive accumulation factor based on temporal gradients. This
method improves the temporal stability and reduces temporal over blurring of the final
image. When the shading information is stored in screen space, such as with SVGF,
reprojecting the information from previous frames is required. This can be done using
either forward- or backwards-projection. Backwards-projection will project a pixel
position from the current frame into the previous frame, while forward-projection will
project a pixel position from a previous frame into the current frame. Reprojecting data
might introduce sub-pixel offsets. These offsets can be countered using a resampling
filter. Other common artifacts are ghosting and flickering.
Lehtinen et al. [29] introduced a meshless hierarchical approach for storing shading
information. This technique avoids the aliasing artifacts by not discretizing the 3D
environment in a uniform way. Instead, a point cloud with Poisson Disk distribution is
pre-calculated. This point cloud can be created using a dart throwing algorithm [30].
Bikker et al. [5] extended this method by also taking into account the local scene
complexity when creating the point cloud. Shading information can be calculated for
every point in the scene, effectively decoupling the geometry representation from the
illumination simulation. Every point in the point cloud is typically represented as a
surface element, or surfel, which can contain additional information for rendering.
Point-based global illumination has been successfully used in offline rendering [4].
Adapting this technique for real-time interactive applications remains challenging as it
relies on a precalculated point cloud. EA SEED [31] introduced the technique in a real-
time setting, taking advantage of the hybrid rendering pipeline for the surfel placement
in world space at runtime. It uses a screen space hole filling algorithm to probabilistically
spawn surfels in the world. The probability is proportional to the pixel’s projected area to
achieve a proper distribution. Even though this method achieves good results in terms of
coverage, it is limited, as it relies on screen space information.
23
Points or surfels can also be generated in world space using stochastic algorithms. To
avoid any pre-processing step, points need to be created at runtime. As a consequence of
the limited frame time in real-time applications, the generation of the point cloud should
thus happen over multiple frames. This limits the application to only use infinite
progressive sample sequences. Christensen et al. [32] gives a good overview of existing
progressive sample sequences and also introduces several new algorithms to
progressively generate point sets with blue noise properties. These blue noise properties
are valuable as they allow for a good distribution, even with a limited number of samples.
While blue noise properties are not mandatory for real-time applications, having a
sequence with low discrepancy is desirable to reduce the variance [10]. It allows for the
application to make a good estimation with a limited number of points, while avoiding
aliasing artifacts.
Our method attempts to translates the random point-based approach to real-time
applications by merging the implicit hierarchical 3D discretization technique [23], using
hashed descriptors, with an implicit progressive low discrepancy point set. The implicit
progressive low discrepancy point set is used to determine which implicit point is used to
cache shading information. The implicit 3D discretization technique is used to improve
the performance of the neighbouring search, which is necessary to find the closest sample
in the low discrepancy point set, and to reduce the memory footprint of the entire point
set. When the closest implicit point is found, its unique index is used to find an entry
within a hash map, where the actual shading information is stored.
To further reduce discretization artifacts, our method randomly selects one of three
closest implicit points using a probability based on 3D barycentric coordinates instead of
jittered sampling [23]. The reconstructing of the final image is performed in world space
using modified Shepard interpolation [33] rather than using screen space reconstruction
filters [26].
Due to the use of implicit functions, memory fetches are kept to a minimum, which allows
the technique to be used efficiently in massively parallel architectures such as a GPU. It
also reduces the required memory compared to a surfel approach, as the location, and any
additional information, of the implicit point does not need to be stored in memory. The
technique also supports adaptive level of detail by allowing the 3D discrepancy to change
on the fly. This can be done with or without the guarantee of low discrepancy samples to
be present across levels.
24
Chapter 3: Algorithm Overview
In this section we introduce our approach for caching shading information in world space
using a limited memory footprint while avoiding discretization artifacts. As mentioned
earlier, storing information in a properly distributed point-based structure can reduce
discretization artifacts [4]. Discretization artifacts occur when shading information for a
continuous surface is cached in a more discontinuous, or discrete, data structure such as
an 2D or 3D texture. Whenever calculations are distributed over multiple frames, to
maintain interactive framerates, one must cache the shading information between
frames. Using current hardware, most applications are limited to perform a limited
number of calculations per pixel per frame. When light transport is being simulated using
ray tracing, the limitation is commonly referred to as one sample per pixel’.
Point-based structures often suffer from large memory footprints. This is mainly due to
the need to store the information that represents the point, which is used to cache the
actual shading information. This information can be, but is not limited to, the position of
the point [34]. Techniques that rely on these structures also need to generate the actual
points. This is often done in a pre-process step, which is not desirable for interactive
applications with a lot of dynamic elements, such as video games.
The proposed method attempts to limit the memory footprint by implicitly representing
a point-based structure. Implicit points are generated at runtime, using 3D noise
functions, thus not relying on any pre-process step. These points are then used to find an
entry within a hash table. This is similar to hashing the quantized descriptors, proposed
by Binder et al. [23], with the difference that our method does not quantize information
using fixed offsets, but rather exploits the non-uniform characteristics of properly
distributed point sets. Discretization artifacts produced by implicit points generated
using a coarse subdivision are further reduced by randomly selecting one of the
surrounding implicit points using a probability based on barycentric coordinates.
The use of 3D noise functions, to create the implicit point cloud, also allows for multiple,
adaptive, levels of detail which can be used to control the density of the point cloud.
Where more precision is required, thus in areas that matter the most, more points can be
generated at runtime.
25
Once the entry within the hash table is found, the calculated shading information gets
injected in the data structure. Because multiple calculations can contribute to the same
entry, an accumulation pass is required, before updating the data structure itself.
Afterwards, shading information can be retrieved from the data structure to reconstruct
the final image using a modified Shepard interpolation. Because the implicit points
represent actual world locations, the data can be interpreted as being stored in world
space, similar to other point-based techniques, and thus allow for world space filtering
instead of screen space filtering. This removes the need to reproject data, partially avoids
the need for motion vectors and improves the filtering.
The proposed method, as already briefly discussed, consists of four sequential passes as
detailed in Figure 1. The output of a pass is used as the input of the next pass in the
sequence. While additional input parameters might be required for calculations within a
pass, they are not presented in the diagram as they are considered pass specific, and they
have no influence on the dependencies between the passes.
Figure 1: An overview of the passes
First, the implicit points are being generated for any given world position, using a unique
seed value based on a discretized version of the world position and an 3D noise function.
Depending on the type of application, the incoming world positions can come from
different types of input buffers. In video games, the world positions usually come from
the output of the visibility test. In ray traced applications these positions are calculated
during a primary hit test, while in rasterized applications, they can be reconstructed from
a depth buffer. Typically, this results in one world position per pixel from a 2D buffer, thus
resulting in a linear complexity (
O(n pixels)
). While the implicit points are generated for
each world position, the wanted shading information can be calculated in parallel.
Once all implicit points are known, the closest implicit point to the world position,
associated with a given pixel, is determined using a random selection based on the
26
probability created from barycentric coordinates. The point, together with its
corresponding seed value, is then stored in a temporary, 2D buffer. During the
accumulation pass, this buffer is used to atomically add all the shading information which
corresponds to the same implicit point, before storing the result in the persistent data
structure. Finally, for a given pixel, the persistent data structure is used to reconstruct the
final image using world space filtering.
3.1 Implicit Point Generation
For any given world position, we start by discretizing
the position based on the voxel size of the wanted level
of detail. This discrete position represents the front top
left corner of our implicit 3D (sub) voxel. As shown in
Figure 2, we further offset this discrete position with
half the voxel size, so it represents the centre of the
voxel. While this step seems optional, it is mandatory to
guarantee seed values are not reused when subdividing
the volume, and to avoids potential reuse of positions
near the origin of the coordinate system.
Using a multidimensional hash function (3D 1D), the remapped discrete position gets
hashed into a single unsigned integer seed value. While we use a nested version of a
permuted congruential generator (PCG) [35], as advised by Jarzynski et al. [36], any other
nested or linearly combined hash function that converts an 3D vector into a single
unsigned value can be used. While the computation time of the hash function is important
for real-time applications, the quality of the output must also be considered when
selecting a different hash function.
Figure 3: Fixed point representation used for each component
Because the PCG nested hash function expects a 32-bit unsigned integer for each
component, we represent a world coordinate as a set of fixed-point values. To support
signed values along each world axis, and thus avoiding repeating patterns from the hash
function output, the sign bit is stored in the most significant bit. Based on the supported
number of subdivisions or levels of detail, , the fractional part of the world
coordinate gets converted to a fixed-point representation. The remaining bits are used for
the integer part. All three parts are then merged into the final 32-bit fixed point
representation, as depicted in Figure 3. While in theory there is no limit to the amount of
LODs one can have, the algorithm is bound to the limitations of representing discrete
world positions as 32-bit fixed point values. Allowing more subdivisions, or LODs, results
in a smaller top-level voxel due to the loss of bits in the integer part. This can be resolved
Figure 2: Remapping discrete
position to centre of sub voxel
27
by using 64-bit values instead but depending on the type of hardware this might have a
negative impact on the performance.
This seed value acquired from the hash
function, which represents our discrete world
position as a single value, together with a
progressive 3D noise function (1D 3D) that
outputs normalized 3D coordinates within the
[0, 1] range, can be used to generate the implicit
points. Each voxel needs eight implicit points,
one in each quadrant, to provide a good
distribution and to have at least one sample in
its sub voxels. The latter is important to
minimize clumping and, if desired, to guarantee shared points across different levels.
While all eight points are generated using the same base seed value, the seed gets
incremented with the index of the quadrant as shown in Figure 4. This results in a pattern
similar to a space-filling curve. Finally, the normalized 3D coordinates obtained from the
3D noise function are remapped into absolute 3D coordinates, using the implicit voxel size
based on the LOD and the quadrant offset, to produce stratified world space locations
within the voxel.
When a denser point set is desirable the previously mentioned sub voxels can be further
subdivided using the same technique. For every sub voxel, the centre point is acquired
and used as the seed value for the next eight implicit points, effectively repeating the same
steps. Because of the use of the implicit sub voxel size, for remapping the random points
to absolute 3D coordinates, the new implicit points tend to be closer to the 3D surface.
While each sub voxel usually generates eight new implicit points for efficiency, the
technique can also be adjusted to guarantee samples across different levels using the
following method.
At the highest level of detail, LOD 0, the algorithm needs to produce eight implicit points
within the voxel as depicted in Figure 5 (a). Whenever a finer LOD for a sub voxel is
needed, only seven new implicit points are required as one point from the previous level
is already present in one of the quadrants of the sub voxel (b). Depending on the output
of the 3D noise function, and the current sub voxel, the shared implicit point can come
from any one of the previous levels (c). As a result, whenever new implicit points need to
be generated for a sub voxel with a certain LOD, one must iterate through every level and
generate the implicit point for that quadrant. While doing this, the implicit point for the
current level is remapped to the required LOD to determine which quadrant is already
occupied. After the comparison with linear complexity,
O(n LOD)
, the seed value which
represents the shared implicit point and the occupied quadrant are known, allowing for
the correct generation of the seven new implicit points, one in each empty quadrant. This
comparison with linear complexity can have a profound impact on performance as this
usually gets executed for every pixel.
Figure 4: Indexing based on quadrant
28
Figure 5: 2D representations of shared points across levels. (a) 4 implicit points, 1 per quadrant, generated using seed value Se0. (b) 3 implicit
points generated using seed value Se1, 1 point shared from Se0. (c) 3 implicit points generated using seed value Se2, 1 point shared from Se1,
and 3 implicit points generated using seed value Se3, 1 point shared from Se0.
Whenever one is interested in finding the closest implicit point to a world position, either
for caching shading information or for filtering in world space, it is important to also
review the neighbouring voxels. It is possible the 3D noise function generates an implicit
point in one of the neighbouring voxels, which is closer to the world position compared
to the voxel where the world position resides in. While it is recommended to analyse all
26 neighbouring voxels for maximum precision, it is possible to limit the number of voxels
to compare (e.g. 0-connectivity, 6-connectivity or 18-connectivity). Lowering the
connectivity results in better performance, but it also introduces additional discretization
artifacts. To limit the number of artifacts, all results in this paper are acquired using a 26-
connectivity comparison.
Figure 6: Implicit point generation algorithm, without guaranteeing samples across LODs, pseudocode
29
Combining the aforementioned techniques, without guaranteeing samples across levels
of detail, results in the compact algorithm presented in Figure 6. To minimize the use of
memory and to limit the memory transfer, thus gaining optimal performance, it is
important to not cache the generated points. Instead, one can easily add additional
calculations to the loop that generates the implicit points.
During the first pass, we are interested in finding the closest implicit point for a given
world position. This point, along with its seed value, will ultimately be used to cache the
related shading information in the persistent data structure. Finding the closest point
could be as simple as comparing the Euclidean distance of every point in the loop to the
actual world position, caching only one point. This results in a Voronoi diagram pattern,
as seen in Figure 7 (a-b). Typically, this pattern displays hard edges near the borders of
every cell. This can be partially avoided by instead randomly selecting one of three closest
points, based on a probability determined by its 3D barycentric coordinates, resulting in
a smoother transition, as depicted in Figure 7 (c). Compared to jittered sampling
proposed by Binder et al. [23], this approach naturally scales to the subdivided volume
the random points reside in and it allows for more implicit points to receive some shading
information.
First, the square Euclidean distances between the world position  and the three closest
points .. are calculated (1.1). Using these distances, an area for each implicit surface
is determined (1.2). Using the sum of these area values, each area value gets normalized
(1.3). Finally, a random value , in range [0,1], is generated and the final index is
determined using (1.4), where is a function that clamps the value in [0,1] range, using
whole numbers after the respective division, and acts as the probability of each point.
This results in an index of 0, 1 or 2, which can be used to select one of the three closest
implicit points. Although this provides a smoother transition along the edges, it takes
additional computation time and it requires the temporary caching of three points instead
of one.
(,,)=,,
=
++
=
+
+
(1.1)
(1.2)
(1.3)
(1.4)
Once the closest point is determined, the point and its seed value are temporarily cached
in an 2D buffer for the next pass, where every texel relates to a given world position.
30
Figure 7: Determining closest points. (a) uses a single closest point, visualizing every cell as a solid colour. (b) uses a single closest point,
visualizing the distance to the closest implicit point. (c) uses Equation 1 to select one of three closest points.
3.2 Updating Data Structure
Once the visible implicit points are known, the respective shading information can be
injected in the persistent data structure. Because of the implicitness of our algorithm, a
simple hash table can be used to store the information, conform the technique proposed
by Binder et al. [23]. Compared to other common caching schemes [37], this reduces the
memory footprint, as no information of the point needs to be stored due to the key
hashing behaviour of a hash table. While this data structure can be interpreted as a simple
sequential structure, during the point generation phase the data can also be seen as an
implicit k-d tree, due to the support of adaptive levels of detail.
Because of the hashing scheme used, every unique point should get a unique identifier.
This unique identifier can be the seed value that was retrieved during the previous pass.
One can also hash this seed value again using a hash (1D 1D) function, if this is desirable
for the hash table implementation. The final identifier can then be used as the key for the
hash table entry. The results in this paper use the seed value generated during the first
pass, thus we do not hash the seed value again, and use a simple hash table which relies
on linear probing.
Before updating the persistent data structure previously mentioned, it is important to
accumulate all the shading information which relates to the same implicit point. How
much shading information gets assigned to the same implicit point depends on the LOD
used during the point generation phase.
3.2.1 Accumulating Shading Information
We accumulate the information by iterating over every texel of the temporary 2D buffer
which was created during the previous pass. Every texel contains the world position of
the implicit point and its corresponding seed value. Using the seed value, an entry is found
in a new, temporary, hash table. This hash table is smaller than the final persistent data
structure. Usually there is a direct relationship between the temporary 2D buffer from the
point generation pass and the 2D buffer containing the shading information, where every
texel in the shading information buffer maps to a texel in the 2D point buffer. This results
in a worst-case scenario where each texel contains a unique seed value, thus no
31
accumulation is necessary. In this case, the size of the temporary hash table is equal to the
number of texels in the 2D buffers.
During accumulation, when a seed value is already present in the hash table, the shading
information gets atomically added to value that is already present in the hash table. Using
atomic operations prevents the use of mutually exclusive locks, thus enhancing the code
quality, improving the performance, and avoiding potential lock issues. Depending on the
hardware, and the used API, only 32-bit integer atomic operations might be available.
While more API’s are starting to support 64-bit integer and floating point atomic
operations, it is advised to use 32-bit integer atomic operations to support a wider range
of hardware, and to reduce the memory used in the hash table.
When relying on 32-bit integer atomic operations, the shading information must also be
represented as an unsigned fixed-point integer. To support the worst-case scenario, it is
a good practise to allocate enough bits for the integer part. For a resolution of 1920x1080
pixels, a maximum integer value of 2,073,600 could be produced. To support this range,
21 bits are needed, or (( )). While one can lower this number to allow
more precision for the fractional part, 11 bits are usually enough. When negative values
need to be supported, an additional bit must be used for the sign bit.
Next to accumulating the shading information in a 32-bit unsigned (fixed point) integer
representation, also the hash table key itself and a count variable needs to be stored. This
count variable is important for correctly updating the persistent data structure, as we are
integrating the results over several frames, similar to Monte Carlo integration.
Considering all the required information, every entry in the hash table ends up with a
memory footprint of 8 bytes for the additional information, plus the number of bytes
needed for caching the actual shading information. When caching ambient occlusion, this
would add up to a total memory footprint of 12 bytes (4 bytes for the shading information,
plus 8 bytes for the key and count values). For optimal performance one can limit the
memory footprint to 16 bytes to accommodate for the 16-byte memory alignment of most
GPU hardware, and thus avoiding additional cache trashing.
3.2.1 Merging Data Structures
After accumulating all the shading information for a given implicit point, the new data
must be injected in the final, persistent, data structure. This data structure is also a single
hash table with the same memory footprint per element as the temporary hash table from
the accumulation pass. The structure caches the shading information, in world space, over
several frames. It is the only persistent data structure in the algorithm. The total size of
the hash table is larger as it represents all the used implicit points in the 3D scene. This
size is determined by the voxel volume in LOD 0 and the amount of supported LODs.
While the accumulation hash table caches the shading information that has, potentially,
been atomically added, the persistent hash table caches the shading information in a
normalized fixed-point representation, with range [0,1]. As a result, only one bit is needed
for the integer part and the remaining bits can be used for the fractional part.
32
Before adding the accumulated data to the normalized cached data the accumulated
data needs to be normalized using the related count (2.3). Because both data
structures might have a different count, the cached data must be rescaled with a correct
weight value, based on their respective count (2.3, representing the cached count and
representing the accumulated count) and the total count,  (2.2). To prevent potential
overflow after acquiring a lot of samples, the total count gets scaled with a constant . The
range of this constant value is [0,1], with a default value of 0.9.
=+
=
+


(2.1)
(2.2)
(2.3)
After rescaling and adding both the accumulated and cached data, the entry in the
persistent hash table can be overwritten with the result , after converting it to a 32-bit
fixed point representation. Because there are no values that will write to the same entry,
and the value gets overwritten, injecting the new data in the persistent hash table is a
simple find and replace. There are no atomic operations required for this replacement.
The count in the persistent entry must also be replaced with the new total count , to
guarantee correct integration with future calculations. At this point, the persistent data
structure is up to date and contains the most recent shading information, possibly
collected over several frames. The temporary hash table and other temporary 2D buffers
are no longer required and can be cleared so it can be reused during the next frame.
3.3 Visualizing Cached Information
Using the persistent hash table, which contains the shading information accumulated over
multiple frames, the final image can be constructed. Because the information is stored in
world space, one can easily retrieve the required shading information using the
previously mentioned implicit point algorithm (Figure 6). It also avoids the need for
reprojection or the use of motion vectors. Performing the same algorithm is necessary to
acquire the seed values of all implicit points that are useful during reconstruction, so that
the cached shading information can be retrieved from the persistent hash table. It is
imperative to not cache any useless information during these calculations to limit the
memory usage and memory transfer.
When retrieving the shading information associated with each implicit point, the final
value is reconstructed using a modified of Shepard approximation [33], as suggested by
Bikker et al. [5]. The search distance  for the approximation is based on the LOD used
by the implicit point algorithm. For each implicit point, a weight value is calculated
(3.1). The weight value is determined by the difference between the search distance and
the distance between world position of the implicit point and the world position, ,
we are trying to reconstruct. Although, while most generated implicit points will reside
within the search radius, the weight value should be clamped to prevent any unwanted
contributions when using a larger search distance. The resulting weight value should be
in the range [0, ]. When calculating the total sum of the cached shading information
of each implicit point, the information should be rescaled using the weight value
associated with each point.
33
After performing the addition, the summation is divided by the sum of all the weights
(3.2). This results in our final filtered shading information, denoted as .
=()


(3.1)
(3.2)
In this chapter we proposed an algorithm, inspired by the algorithm proposed by Binder
et al. [23], to cache shading information in world space using a hash table, where the
entries within the data structure are determined using an implicit progressive low
discrepancy point set. Due to the non-uniform distribution, in world space, discretization
errors should be reduced. The algorithm relies on implicit 3D noise functions to generate
its point set, allowing for reproducible results at runtime, there for keeping the memory
usage and the memory transfer to a minimum.
The algorithm performs its calculations in four sequential passes. First the closest implicit
point for each given world position is determined, while the shading information is being
calculated in a parallel pass. Each implicit point is generated using a unique identifier,
based on a discrete representation of the world location, which serves as the seed value
for the implicit 3D noise function. Once all the visible implicit points are known, extra
randomness is introduced by randomly selecting one of three closest points using a
probability based on barycentric coordinates. After determining the closest implicit point
shading information can be injected in the hash table. Before updating the data structure,
one must first atomically accumulate all the shading information that is associated with
the same implicit point. This accumulation pass is necessary for the correct integration of
the information, but also to maximize the performance by limiting the merge to a subset
of the persistent data structure. After updating the data structure, for any given world
position, a final image can be reconstructed by interpolating the cached shading
information in world space from neighbouring entries in the hash table. These entries are
also determined using the same implicit 3D noise function.
The next chapter presents the results of the experiments performed using the proposed
algorithm. For every experiment, the resulting image produced by the algorithm is
compared with a ground truth equivalent. To evaluate the quality, we calculate the mean-
square error (MSE), the peak signal-to-noise ratio (PSNR) and the structural similarity
index (SSIM) between both images. Performance is evaluated by measuring the average
rendering time, in milliseconds, and by analysing the characteristics of the memory usage.
34
Chapter 4: Results
In this section, we present the results of the experiments performed using the proposed
algorithm. The algorithm is used to cache ambient occlusion, which typically has a higher
frequency compared to indirect illumination. The technique is evaluated by comparing it
to existing work. The comparison is performed by evaluating the output of the used
technique, either using the proposed algorithm or using an existing technique, with a
ground truth image. The quality of the final image is evaluated by capturing and analysing
the mean-square error (MSE), the peak signal-to-noise ratio (PSNR) and the structural
similarity index (SSIM).
All experiments are performed using a system with the following specifications: AMD
Ryzen 9 3950X 16-Core Processor, 32 GB DDR4 DRAM 3200Mhz and Nvidia RTX 3070
with 8GB VRAM. The image quality metrics are collected using the scikit-image image
processing package, version 0.18.1, in Python [38]. Every ground truth image is
constructed using 2048 samples per pixel (SPP), while the real-time counterparts are
generated using 1 SPP over 64 frames. All images are rendered at HD resolution (1920 x
1080) with a prototype developed using Microsoft’s MiniEngine. This prototype uses a
coarse volume of 2560 units for LOD 0, with every image using a LOD as specified in the
results. Finally, all results are generated using the algorithm that does not guarantee
samples across different levels as presented in Figure 6, unless stated otherwise.
4.1 Comparison with Existing Work
Binder et al. [23] uses fixed discretization to cache shading information in its hash table,
which represents discrete locations in world space as depicted in Figure 8 (a). The
proposed method instead uses the fixed discrete position, together with the LOD
information, to seed a random noise function which generates eight implicit random
points. To guarantee the same density, the random point technique must use the fixed
positions, to generate the seed value, from LOD-1 (Figure 8 (b)).
35
Figure 8: Difference between discretization methods.
(a) Fixed discretization using LOD 9. (b) Discretization using implicit random points based on fixed positions from LOD 8.
To minimize discretization artifacts, before assigning a discrete cell to a pixel, Binder et
al. [23] jitters the world positions in tangent space, effectively generating more noise as
seen in Figure 9 (a). The amount of noise generated is based on the allowed jittering
distance. The proposed method randomly picks one of three closest points using a
probability based on barycentric coordinates. This method scales with the distance
between the implicit points and does not require an additional parameter. In Figure 9 (b),
the technique is used with a fixed discretization, while Figure 9 (c) uses the random
implicit points from LOD-1.
Figure 9: Difference between jittering and random selection using probability based on barycentric coordinates.
(a) Fixed discretization using jittering (radius 3 LOD 11). (b) Fixed discretization using barycentric coordinates-based probability (LOD 11).
(c) Implicit points using barycentric coordinates-based probability (LOD 10).
36
To reconstruct the final image Binder et al. [23] uses a screen space smoothing filter
similar to SVGF [26]. Instead, our proposed method uses a modified Shepard interpolation
which interpolates the cached shading information, from the implicit points within the
current discrete cell and its neighbouring cells, directly in world space. The search radius
is based on the implicit cell size and thus based on the LOD. We compare the results of
reconstructing the final image using the modified Shepard interpolation for all three
variants as seen in Figure 10.
Figure 10: Reconstruction using modified Shepard interpolation in world space.
(a) Fixed discretization using jittering (radius 3 LOD 11). (b) Fixed discretization using barycentric coordinates-based probability (LOD 11).
(c) Implicit points using barycentric coordinates-based probability (LOD 10).
4.2 Image Quality
We measure the image quality, by obtaining the MSE, PSNR and SSIM, from multiple
scenes (Amazon’s Lumberyard Bistro Interior & Exterior, Georgia’s Chinese Dragon and
Nvidia’s Hairball), using scikit-image image processing package, version 0.18.1, in Python
[38]. The metrics are based on images using the .TIFF format. For the SSIM metric, the
parameters match the implementation of Wang et al [39]. All models are gathered from
Morgan McGuire’s Computer Graphics Archive [40].
Table 1: Collected metrics using different set-ups as discussed in Section 4.1
37
Figure 11: Bistro Interior: Cabine (close-up)
38
Figure 12: Bistro Exterior: Bike (close-up)
39
Figure 13: Chinese Dragon (close-up)
40
Figure 14: Hairball (close-up)
41
Figure 15: SSIM over frames using different Levels of Detail.
Table 1 quantifies the differences between the reconstructed images using the modified
Shepard interpolation and the respective ground truth image, as described earlier in
section 4.1, as well as the difference between the unfiltered images and the same ground
truth image. For each scene, we compare the filtered and unfiltered version using fixed
discretization, either with jittered sampling or with random selection using a probability
based on barycentric coordinates, and using random discretization with the previously
mentioned random selection. The images used to acquire these metrics are shown in
Figures 11 to 14.
Figure 15 shows the SSIM metric of several renders, using the same static scene, taken
over multiple frames. Every image uses the world space data captured during previous
frames, thus trying to better approximate the final image by accumulating shading
information over time. The images used to extract these metrics are reconstructed images
using the proposed method. The analysis is done with renders from both the Amazon’s
Lumberyard Bistro Interior and Exterior scenes, as depicted in Figure 16.
Figure 16: Ground truth images from Amazon’s Lumberyard Interior & Exterior used for metrics in Figure 15.
42
4.3 Memory Usage and Performance
All results recorded in this section use a simple GPU hash table inspired by the
implementation of David Farrell [41], which relies on linear probing whenever there is a
hash collision. Every element in the hash table stores a hash key, a shading value and a
count which is used for accumulating samples over time. All values are stored in a 32-bit
unsigned integer type, which represents either an integer number or a real number using
a fixed-point representation. The default hash table allocates enough memory for
4194304 elements, where each element is 12 bytes in size, which results in a total
memory footprint of approximately 55.3 megabytes (Mb).
Table 2: The number of unique elements occupying the persistent hash table for static renders in Figure 16.
Table 2 shows the number of unique keys or elements, per LOD, occupying the persistent
world hash table for a given image, as well as the total amount of memory, in megabytes,
used by those elements. The unique keys are generated, using the proposed method, from
the static renders depicted in Figure 16.
Although the prototype, and thus the proposed algorithm, has
not been fully optimized, we perform an initial performance
analysis. Table 3 shows the average render times, in
milliseconds (ms), for the images in Figures 13, 14 and 16
using their respective LOD. These timings include all passes
such as G-buffer creation, RTAO, implicit point generation and image reconstruction.
Table 3: Render times in ms
43
Chapter 5: Discussion and Limitations
In the previous section we presented the results of the experiments performed using the
proposed method. These results include the metrics mean-squared error (MSE), peak
signal-to-noise ratio (PSNR) and structural similarity index (SSIM) to quantify the image
quality (Table 1), the images used to measure the mentioned metrics (Figures 11 to 14),
a chart displaying the SSIM metric evolving over several frames for two scenes (Figure
15), an overview of the hash table occupancy for different levels of detail showing both
the number of elements and their memory footprint (Table 2) and a brief summary of the
average rendering times, in milliseconds, for the given scenes (Table 3).
5.1 Image Quality
Analysing and comparing metrics such as SSIM can be complicated as the results heavily
rely on the test case, as seen from the data presented in Table 1. For the results of the
unfiltered images, the SSIM increases when using the barycentric coordinate-based
probability technique instead of the jittered approach. This is true for most common
scenes, except for scenes with a dominating number of concave shapes or with fine
overlapping details, such as the Chinese dragon scene and the Hairball scene. In these
cases, the quality typically improves or declines with a smaller factor compared to the
improvements seen in the other scenes. The difference between the SSIM from images
generated by discretizing using a fixed offset versus using random implicit points, both
using the barycentric coordinate-based probability technique, is minimal though mostly
negative. This is different compared to the images that have been reconstructed using the
modified Shepard interpolation. Although images from scenes with a lot of geometrical
edges aligned to the view plane tend to slightly loose image quality, scenes with more
unaligned surfaces tend to slightly gain image quality. It must be said that the average
difference in SSIM, between images using either jittering or barycentric coordinate-based
random selection, is only
±0.035, with the largest
positive difference being
±0.104. The differences
between the metrics of the
reconstructed images are
displayed in Table 4.
Table 4: Differences between the metrics from reconstructed images.
44
When comparing the other metrics such as MSE and PSNR, we observe a similar trend.
Although the proposed technique using implicit random points scores slightly worse
compared to the one with a fixed discretization, both again using the barycentric
coordinate-based random selection, the differences between the metrics are minimal.
When analysing the data, the images generated using the implicit point technique with
barycentric coordinate-based probability and modified Shepard interpolation tend to
perform worse, though the quality depends on the 3D noise function used by the
technique. Nevertheless, when we visually compare the images, we clearly see less
intrusive discretization artifacts when using the proposed method. Some areas that are
affected by these discretization artifacts are highlighted with red squares in Figure 17 (a,
b). Even though these metrics are widely used in computer graphics research, it is
important to be aware of the context in which these metrics are used. Comparing the
images, one can state that the reconstructed image in Figure 17 (c) is more appealing
because it contains less intrusive discretization artifacts compared to the other images.
While the used metrics are useful, for example to determine how closely a reconstructed
image matches its ground truth equivalent, they cannot be the sole parameter to evaluate
image quality. Metrics that take into account human visual sensitivity, such as contrast
and colour patterns, should be considered more frequently within computer graphics
research to determine the final image quality. This observation thus fuels the debate if
image quality should be measured by how accurate a reconstructed image is, by having it
match the ground truth image more closely, or by how visually pleasing it is.
Figure 17: Artifacts after modified Shepard interpolation in world space.
(a) Fixed discretization using jittering (radius 3 LOD 11). (b) Fixed discretization using barycentric coordinates-based probability (LOD 11).
(c) Implicit points using barycentric coordinates-based probability (LOD 10). The red squares show discretization artifacts, while the
artifacts in the orange squares are due to missing data.
45
The orange squares seen in Figure 17 (a) are due to missing data for a particular world
position, reconstructed from the pixel depth value. The artifacts emerge when the
jittering distance is too large or when the Euclidean distance between world space
locations of the current pixel and its surrounding pixels is too large. Depending on the
random function used to jitter, the amount of artifacts present might differ. Lowering the
jittering distance results in more uniform discretization artifacts, similar to the ones seen
in Figure 17 (b). While the jittered output is acceptable for screen space-based filters, it is
not for world space filters with the same characteristics as the modified Shepard
interpolation method. These artifacts are not present when using the barycentric
coordinate-based random selection as it scales according to the distances between the
implicit points. It is thus recommended to use this technique, or a proper dynamically
scaling jittered version, when using world space filtering.
While screen space filtering techniques have been proven to be very useful and
performant for real-time applications, world space filtering techniques might be
interesting when dealing with world space cached shading information. When filtering in
world space, it is possible to consider information which is currently not visible, creating
the opportunity for a more accurate reconstruction. To guarantee all surrounding data is
being taking into consideration, when using the implicit point technique, all neighbouring
implicit voxels need to be processed, which contrasts with the method proposed by
Binder et al. [23].
Compared to screen space filtering techniques, world space filtering is more
straightforward as there is also no need to determine edges or other discrepancies in the
data, which is commonly the case when using filters such as an edge-avoiding à-trous
filter [42]. Even though filtering in world space can be beneficial, it can still suffer from
light and shadow leaking, as well as loosing higher frequency information, due to the
discretization of continuous surfaces. By increasing the LOD, thus making the point set
denser, these issues can be kept to a minimum. The trade-off for increasing the LOD is
using more memory. As depicted in Figure 15, when using a higher LOD the image quality
is also increasing over time, effectively creating a better approximation. When the LOD is
low, accumulating the shading information is less effective as more world locations
contribute to the same implicit point, thus loosing higher frequency details. These results
are comparable with other commonly used discretization techniques.
46
5.2 Memory Usage and Performance
The statistics shown in Table 2 and 3 are captured from a static camera, using a static
scene and a fixed LOD. When comparing the number of cached unique keys with the
allocated world hash table, we observe an occupancy of ±9% for the interior scene and
±22% for the exterior scene, both using a LOD of 9. This occupancy partially determines
the performance measured in Table 3. Comparable with other caching techniques that
rely on a hash table with linear probing, whenever the occupancy of the hash table goes
up, the performance goes down. This is the result of hash collisions and the use of linear
probing for finding empty slots within the hash table. Using different hashing schemes,
such as Cuckoo hashing [43], could improve the performance, they are still sensitive to
performance hits due to occupancy. When using a dynamic camera, which captures the
scene from multiple points of view, an identical pattern can be observed. It is thus
important to keep the hash table occupancy low when real-time framerates are desirable.
The occupancy has no impact on the quality of the final image, unless if there is not enough
storage to store newly acquired shading information in an entry based on a new implicit
point. Whenever there is not enough storage, an eviction strategy can be used to replace
old, not often used, cached shading information. Another solution is adding a streaming
feature which allows the loading and unloading of less relevant cached shading
information. One must be careful when using these solutions as they might have an impact
on the actual image quality when not implemented properly.
Comparing the performance with other methods, such as Binder et al. [23], we notice
worse rendering times. This is due to the additional neighbourhood search, using a 26-
connectivity for its implicit voxels, which is necessary to guarantee proper image quality
when using implicit points. Another factor is the generation of the implicit points which
are used to create unique identifiers for storage and during world space filtering. While
the random 3D noise function used to generate the implicit points influences the image
quality, it also impacts the performance. Finding a good balance between quality and
speed is important. Jarzynski et al. [36] provides us with a good overview of the quality
and the performance of different hash functions.
The memory footprint of every element is smaller compared to other point set or surfel
techniques as we do not need to store the actual position, radius and normal of every
point. While we do not store the data, due to the implicitness of the technique, the data
can be generated at runtime when needed. The main difference between point set
47
techniques and ours is that the proposed method is volumetric, and thus might require
more subdivisions to create points closer to the actual surfaces. For certain shading
effects such as indirect lighting this is not problematic as they tend to benefit from the
volumetric approach, while other shading effects such as ambient occlusion do not. While
the proposed technique uses less memory than other point set or surfel techniques, the
memory footprint is similar to other hash-based methods. These also require higher
subdivisions for certain shading effects, although without benefiting from the
randomness of our subdivision scheme [44].
In order to lower the number of unique keys, and thus the number of implicit points, we
attempted to create a subdivision scheme which allows implicit points from a previous
LOD to be present in a given LOD. Although we succeeded in guaranteeing cross-LOD
implicit points, as discussed in section 3.1, the extra computation time needed does not
allow it to be used in real-time interactive applications as the current implementation is
on average three times slower, negating most benefits of our method. It also suffers from
the curse of dimensionality, as every subdivision in 3D has a higher chance of generating
a new implicit point closer to the surface, effectively removing the use of cross-LOD
points. This does not happen when working in 2D. The quality of the image produced with
the modified technique is theoretically superior, as the distribution is slightly better, but
due to filtering the differences are negligible. A modified version of the algorithm (Figure
6), which guarantees shared implicit point across LODs, can be found in Appendix A.
Finally, another approach to lower the number of unique keys is to use adaptive LODs.
The wanted LOD, for a given pixel, can be determined by using simple fixed distance
offsets, by using ray differentials and ray cones [45] or by defining a target feature size,
as proposed by Gautron et al. [44], which uses several parameters such as camera
aperture, resolution and distance from the camera. Testing our method with the simple
fixed distance offsets lowered the number of unique keys but also lowered the image
quality substantially, thus making it less useful unless if it is used in very large scenes.
Although we have not tested the previously mentioned target feature size approach, we
expect the impact of this to be minimal as shading effects such as AO require a dense point
set due to its higher frequency nature. In section 6.2 we will briefly discuss a different
LOD approach which might be interesting to explore in the future.
48
5.3 Limitations
While our method has proven to be visually more pleasing, when combined with
barycentric coordinate-based random selection and modified Shepard interpolation in
world space, we identified several limitations.
The quality of the implicit point set heavily relies on the random 3D noise function used.
Using functions that suffer from a lot of repetition or other artifacts will lower the quality
of the images produced with the proposed method. Tightly coupled to the random 3D
noise function is the seed value that is being used. In the proposed method, we generate
a seed value for each discrete location in the implicit voxel structure. This seed value is
constructed out of a sign bit, an integer part which represents the absolute world location
of the implicit voxel and a fractional part which represents the wanted subdivision or LOD
for the implicit voxel, as depicted in Figure 3. Because most 3D noise functions rely on 32-
bit seed values, we need to carefully select the number of bits for the integer and fractional
part. Allowing more subdivisions results in a smaller overarching volume, represented by
the integer part, and vice versa. This limitation can be challenging when working with
large scenes or with large draw distances. We expect that this limitation can be overcome
by instead using 64-bit seed values, together with proper 3D noise functions that support
them, or by implementing a robust streaming feature which allows for reusing the same
unique hash key for different world locations.
The proposed method also suffers from potential light and shadow leaks. This can be
partially resolved by further enhancing the unique hash key generation. Other hash-based
methods, such as Binder et al. [23] and Roth et al. [46], use the normal of the surface
during the generation of the unique key to differentiate between different oriented
surfaces from the discretised locations in world space. Pantaleoni et al. [24] uses radiance
fields instead of normals, effectively introducing a directional component on top of a
spatial component during the discretization. Be aware that every differentiation between
implicit points, fixed or random, requires a unique entry within the hash table, thus
increasing the amount of memory used. While introducing more parameters during the
discretization phase might resolve certain artifacts, the memory footprint will likely
increase as well. Our method has only been tested using spatial discretization without
taking into account the normal or any other, spatial or temporal, data.
49
Chapter 6: Conclusion and Future Work
In the previous section we discussed the results of the experiments performed using the
proposed method. We reviewed the image quality, the memory usage, the performance,
and current limitations. In this section we will reflect on these results, link the results with
the initial research questions and briefly describe which techniques or concepts can be
part of future research.
6.1 Conclusion
In this thesis we have presented a technique which uses random implicit points, instead
of fixed offsets, to discretize the environment. This results in a discretization scheme
which is comparable to a Voronoi pattern. We introduced more randomness near the
edges of each implicit cell by randomly selecting one of three closest points using a
barycentric coordinate-based probability. The final reconstruction uses a modified
Shepard interpolation approach to filter the cached information in world space. Because
the information is stored in world space, cached shading information from previously
visible surfaces, recorded during previous frames, can be effectively reused. It also
removes the need for motion vectors and reprojection in order to be able to use previous
samples for the reconstruction. It thus effectively eliminates some of the limitations of
commonly used screen space filtering techniques.
While the image quality metrics suggest that the reconstructed images, using the
proposed technique, are slightly less accurate compared to using fixed discretization and
jittered sampling, the images noticeably contain less artifacts. In scenes where the
geometry is not aligned to the camera view plane or where the surfaces are more curved,
the metrics tend to favour the proposed method. Comparing the metrics and the visual
reconstructed output, one can argue that the commonly used quality metrics in computer
graphics research are not sufficient to solely judge the overall image quality. While the
metrics can be useful in a lot of scenarios, they are not effective in every context. Using
image quality metrics that are designed from a human-oriented perspective, such as
discussed by Kim et al. [47], should be explored and potentially considered.
Answering our initial research questions, we were able to implicitly create random 3D
points, effectively simulating a point set. Because of the hashing scheme used to generate
the seed values for the 3D noise function, a volume can be progressively subdivided, hence
supporting adaptive LOD to control the point set density. We attempt to guarantee a
50
decent distribution by remapping every implicit point within its respective implicit voxel,
thus avoiding clumping to some extent. All of this can be achieved without any expensive
memory lookups. During filtering, querying our point set is kept to a minimum by only
considering the current and the neighbouring voxels within the current LOD. While it does
have an impact on performance, it improves the image quality of the reconstructed image.
Finally, all shading information is cached in a hash table, similar to other hash-based
techniques. While the performance is partially dependent on the hash table occupancy, it
is still the most desirable data structure within a massive parallel environment due to its
simplicity and its excellent random-access support.
6.2 Future Work
During future research, the proposed technique should be extended and tested with
certain features which are already present in similar techniques. First, the technique
should be adjusted to support dynamic scenes using temporal integration and temporal
filtering. The hash generation can also be extended by considering the normals of the
discretized surfaces to further differentiate between the cached shading information.
To improve performance, the hash table implementation can be replaced with a scheme
which relies on Cuckoo hashing. Previous research [46] has shown this can drastically
improve overall performance, as long as the occupancy stays below 50%. Another option
is to develop a streaming scheme which allows for unique keys to be reused, while data
from older, not visible, implicit points is being unloaded and stored on the hard drive.
When exploring this option, it is important to carefully consider the data transfer between
the GPU and the CPU, while testing it with large scenes.
The memory usage could be further reduced by exploring different LOD schemes. While
most methods rely on distance-based or feature size-based techniques, the subdivisions
or LOD could potentially be determined by approximating the local shading complexity.
We have implemented a first attempt where we tried to determine the local shading
complexity using the minmax-map approach, which has been effectively used for
multiresolution splatting in screen space [48], [49]. The minmax-map was generated from
the ray traced ambient occlusion map instead of the normal and depth buffer, as this map
accounts for shading information based on geometry both on- and offscreen. While this
first attempt shows some promising initial results, as depicted in Figure 18, it suffers from
artifacts near the borders of the image and near unstable noisy areas. The latter is due to
the noisy nature of the raw RTAO buffer which is generated using only 1 sample per pixel.
51
Figure 18: minmax-map approach to estimate local shading complexity for selecting the wanted LOD.
Another interesting alternative would be to project the random implicit points onto the surfaces
in the scene. To make a distinction between the different surfaces, and thus the implicit points,
the surface ID will need to be considered when generating the unique hash key. While this will
eliminate the volumetric characteristics of our approach, it might reduce the number of points
needed to better approximate shading effects such as ambient occlusion.
Finally, all results should be revaluated using different image quality metrics. These metrics
ideally consider human visual sensitivity to better account for intrusive artifacts.
52
Appendix A
Implicit point generation algorithm, modified to guarantee samples across LODs, pseudocode
53
Bibliography
[1] F. C. Crow, “The Aliasing Problem in Computer-Generated Shaded Images,Commun.
ACM, vol. 20, no. 11, pp. 799805, 1977, doi: 10.1145/359863.359869.
[2] R. L. Cook, “Stochastic sampling in computer graphics,ACM Trans. Graph., vol. 5, no. 1,
pp. 5172, 1986, doi: 10.1145/7529.8927.
[3] M. A. Dippe and E. H. Wold, “Antialiasing Through Stochastic Sampling.,Comput. Graph.,
vol. 19, no. 3, pp. 6978, 1985, doi: 10.1145/325165.325182.
[4] P. H. Christensen, “Point-Based Global Illumination for Movie Production,” no. July, 2010.
[5] J. Bikker and R. Reijerse, “A Precalculated Point Set for Caching Shading Information,”
Eurographics 2009, pp. 14, 2009.
[6] M. Krcmar, K. Farrar, and R. McGloin, “The effects of video game realism on attention,
retention and aggressive outcomes,” Comput. Human Behav., vol. 27, no. 1, pp. 432439,
2011, doi: 10.1016/j.chb.2010.09.005.
[7] N. Wang and W. Doube, “How real is reality? A perceptually motivated system for
quantifying visual realism in digital images,” Proc. - 2011 Int. Conf. Multimed. Signal
Process. C. 2011, vol. 2, pp. 141149, 2011, doi: 10.1109/CMSP.2011.172.
[8] S. Lyu and H. Farid, “How realistic is photorealistic?,” IEEE Trans. Signal Process., vol. 53,
no. 2 II, pp. 845850, 2005, doi: 10.1109/TSP.2004.839896.
[9] J. T. Kajiya, “The rendering equation,Proc. 13th Annu. Conf. Comput. Graph. Interact.
Tech. SIGGRAPH 1986, vol. 20, no. 4, pp. 143150, 1986, doi: 10.1145/15922.15902.
[10] R. E. Caflisch, “Monte Carlo and quasi-Monte Carlo methods,” Acta Numer., vol. 7, pp. 1
49, 1998, doi: 10.1017/S0962492900002804.
[11] G. J. Ward, F. M. Rubinstein, and R. D. Clear, “A ray tracing solution for diffuse
interreflection,” Proc. 15th Annu. Conf. Comput. Graph. Interact. Tech. SIGGRAPH 1988, vol.
22, no. 4, pp. 8592, 1988, doi: 10.1145/54852.378490.
[12] O. Good and Z. Taylor, “Optimized photon tracing using spherical harmonic light maps,
ACM SIGGRAPH 2005 Sketches, SIGGRAPH 2005, p. 53, 2005, doi:
10.1145/1187112.1187175.
[13] “Real-time Raytracing for Interactive Global Illumination Workflows.
https://www.ea.com/frostbite/news/real-time-raytracing-for-interactive-global-
illumination-workflows-in-frostbite (accessed Oct. 25, 2020).
[14] K. E. Hillesland and J. C. Yang, “Texel shading,” Eur. Assoc. Comput. Graph. - 37th Annu.
Conf. EUROGRAPHICS 2016 - Short Pap., pp. 7376, 2016, doi: 10.2312/egsh.20161018.
[15] M. Andersson, J. Hasselgren, R. Toth, and T. Akenine-Möiler, “Adaptive texture space
shading for stochastic rendering,” Comput. Graph. Forum, vol. 33, no. 2, pp. 341350,
54
2014, doi: 10.1111/cgf.12303.
[16] J. Munkberg, J. Hasselgren, P. Clarberg, M. Andersson, and T. Akenine-Möller, “Texture
space caching and reconstruction for ray tracing,” ACM Trans. Graph., vol. 35, no. 6, pp. 1
13, 2016, doi: 10.1145/2980179.2982407.
[17] A. Panteleev, “Practical Real-Time Voxel-Based Global Illumination for Current GPUs,”
2014. [Online]. Available: http://on-
demand.gputechconf.com/gtc/2014/presentations/S4552-rt-voxel-based-global-
illumination-gpus.pdf.
[18] C. C. Tanner, C. J. Migdal, and M. T. Jones, “The clipmap: A virtual mipmap,” Proc. 25th
Annu. Conf. Comput. Graph. Interact. Tech. SIGGRAPH 1998, pp. 151158, 1998, doi:
10.1145/280814.280855.
[19] C. Crassin, F. Neyret, M. Sainz, S. Green, and E. Eisemann, “Interactive indirect
illumination using voxel-based cone tracing: An insight,ACM SIGGRAPH 2011 Talks,
SIGGRAPH’11, vol. 30, no. 7, 2011, doi: 10.1145/2037826.2037853.
[20] G. Greger, P. Shirley, P. M. Hubbard, and D. P. Greenberg, “The irradiance volume,” IEEE
Comput. Graph. Appl., vol. 18, no. 2, pp. 3243, 1998, doi: 10.1109/38.656788.
[21] J. G. Hu Milo Yip Elias Alonso Shihao Gu Xiangjun Tang Xiaogang Jin, “Signed Distance
Fields Dynamic Diffuse Global Illumination.
[22] M. McGuire, M. Mara, D. Nowrouzezahrai, and D. Luebke, “Real-time global illumination
using precomputed light field probes,” Proc. - I3D 2017 21st ACM SIGGRAPH Symp.
Interact. 3D Graph. Games, 2017, doi: 10.1145/3023368.3023378.
[23] N. Binder, S. Fricke, and A. Keller, “Massively Parallel Path Space Filtering,” 2019,
[Online]. Available: http://arxiv.org/abs/1902.05942.
[24] J. Pantaleoni, “Online path sampling control with progressive spatio-temporal filtering,”
SN Comput. Sci., pp. 116, 2020, doi: 10.1007/s42979-020-00291-z.
[25] A. Keller, K. Dahm, and N. Binder, “Path space filtering,” ACM SIGGRAPH 2014 Talks,
SIGGRAPH 2014, pp. 112, 2014, doi: 10.1145/2614106.2614149.
[26] C. Schied et al., “Spatiotemporal variance-guided filtering: Real-time reconstruction for
path-traced global illumination,” Proc. High Perform. Graph. HPG 2017, 2017, doi:
10.1145/3105762.3105770.
[27] D. Scherzer et al., “Temporal coherence methods in real-time rendering,” Comput. Graph.
Forum, vol. 31, no. 8, pp. 23782408, 2012, doi: 10.1111/j.1467-8659.2012.03075.x.
[28] C. Schied, C. Peters, and C. Dachsbacher, “Gradient Estimation for Real-time Adaptive
Temporal Filtering,” Proc. ACM Comput. Graph. Interact. Tech., vol. 1, no. 2, pp. 116,
2018, doi: 10.1145/3233301.
[29] J. Lehtinen et al., “A meshless hierarchical representation for light transport,
55
SIGGRAPH’08 Int. Conf. Comput. Graph. Interact. Tech. ACM SIGGRAPH 2008 Pap. 2008, vol.
27, no. 3, pp. 19, 2008, doi: 10.1145/1399504.1360636.
[30] D. Cline, S. Jeschke, K. White, A. Razdan, and P. Wonka, “Dart throwing on surfaces,”
Comput. Graph. Forum, vol. 28, no. 4, pp. 12171226, 2009, doi: 10.1111/j.1467-
8659.2009.01499.x.
[31] T. Stachwiak, “Stochastic all the things: Raytracing in hybrid real-time rendering,” Digital
Dragons, 2018. https://www.ea.com/seed/news/seed-dd18-presentation-slides-
raytracing (accessed Oct. 29, 2020).
[32] P. Christensen, A. Kensler, and C. Kilpatrick, “Progressive Multi-Jittered Sample
Sequences,” Comput. Graph. Forum, vol. 37, no. 4, pp. 2133, 2018, doi:
10.1111/cgf.13472.
[33] SHEPARD D, “Two- dimensional interpolation function for irregularly- spaced data,” Proc
23rd Nat Conf, pp. 517524, 1968.
[34] Z. Dong, J. Kautz, C. Theobalt, and H. P. Seidel, “Interactive global illumination using
implicit visibility,” Proc. - Pacific Conf. Comput. Graph. Appl., pp. 7786, 2007, doi:
10.1109/PG.2007.38.
[35] M. E. O ’neill, M. E. O ’, and N. September, “Harvey Mudd College Computer Science
Department Technical Report PCG: A Family of Simple Fast Space-Efficient Statistically
Good Algorithms for Random Number Generation PCG: A Family of Simple Fast Space-
Efficient Statistically Good Algorithms for Random,” 2014, [Online]. Available:
www.cs.hmc.edu.
[36] M. Jarzynski and M. Olano, “Hash Functions for GPU Rendering,” J. Comput. Graph. Tech.
Hash Funct. GPU Render., vol. 9, no. 3, pp. 2038, 2020, [Online]. Available:
http://jcgt.orghttp//jcgt.org.
[37] A. Dietrich, J. Schmittler, and P. Slusallek, “World-Space Sample Caching for Efficient Ray
Tracing of Highly Complex Scenes,” 2006.
[38] S. Van Der Walt et al., “Scikit-image: Image processing in python,” PeerJ, vol. 2014, no. 1,
pp. 118, 2014, doi: 10.7717/peerj.453.
[39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From
error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600
612, 2004, doi: 10.1109/TIP.2003.819861.
[40] “McGuire Computer Graphics Archive.” https://casual-effects.com/data/ (accessed Jun.
16, 2021).
[41] “A Simple GPU Hash Table - Nosferalatu.”
https://nosferalatu.com/SimpleGPUHashTable.html (accessed Jun. 17, 2021).
[42] H. Dammertz, D. Sewtz, J. Hanika, and H. P. A. Lensch, “Edge-avoiding À-Trous wavelet
56
transform for fast global illumination filtering,” High-Performance Graph. - ACM
SIGGRAPH / Eurographics Symp. Proceedings, HPG , no. Hpg, pp. 6775, 2010.
[43] R. Pagh, “Cuckoo Hashing,” Encycl. Algorithms, vol. 14186, pp. 15, 2015, doi:
10.1007/978-3-642-27848-8_97-2.
[44] P. Gautron, “Real-time ray-traced ambient occlusion of complex scenes using spatial
hashing,” Spec. Interes. Gr. Comput. Graph. Interact. Tech. Conf. Talks, SIGGRAPH 2020,
2020, doi: 10.1145/3388767.3407375.
[45] T. Akenine-Möller, J. Nilsson, M. Andersson, C. Barré-Brisebois, R. Toth, and T. Karras,
“Texture Level of Detail Strategies for Real-Time Ray Tracing,” in Ray Tracing Gems: High-
Quality and Real-Time Rendering with DXR and Other APIs, E. Haines and T. Akenine-
Möller, Eds. Berkeley, CA: Apress, 2019, pp. 321345.
[46] H. H. Caching and I. Rendering, “Hash-Based Hierarchical Caching and Layered Filtering
for Interactive Previews in Global Illumination Rendering,” 2020, doi:
10.3390/computers9010017.
[47] J. Kim and S. Lee, “Deep learning of human visual sensitivity in image quality assessment
framework,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol.
2017-Janua, pp. 19691977, 2017, doi: 10.1109/CVPR.2017.213.
[48] G. Nichols and C. Wyman, “Multiresolution splatting for indirect illumination,” Proc. I3D
2009 2009 ACM SIGGRAPH Symp. Interact. 3D Graph. Games, pp. 8390, 2009, doi:
10.1145/1507149.1507162.
[49] G. Nichols, R. Penmatsa, and C. Wyman, “Interactive, multiresolution image-space
rendering for dynamic area lighting,” Comput. Graph. Forum, vol. 29, no. 4, pp. 1279
1288, 2010, doi: 10.1111/j.1467-8659.2010.01723.x.
57
This template is based on a template by:
Steve Gunn (http://users.ecs.soton.ac.uk/srg/softwaretools/document/templates/)
Sunil Patel (http://www.sunilpatel.co.uk/thesis-template/)
Template license:
CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This work introduces a novel technique called progressive spatio-temporal filtering, an efficient method to build all-frequency approximations to the light transport distribution into a scene by filtering individual samples produced by an underlying path sampler, using online, iterative algorithms and data-structures that exploit both the spatial and temporal coherence of the approximated light field. Unlike previous approaches, the proposed method is both more efficient, due to its use of an iterative temporal feedback loop that massively improves convergence to a noise-free approximant, and more flexible, due to its introduction of a spatio-directional hashing representation that allows to encode directional variations like those due to glossy reflections. We then introduce four different methods to employ the resulting approximations to control the underlying path sampler and/or modify its associated estimator, greatly reducing its variance and enhancing its robustness to complex lighting scenarios. The core algorithms are highly scalable and low-overhead, requiring only minor modifications to an existing path tracer. Combined with the most recent advances in ray-tracing hardware and denoising algorithms, our work enables performing high-quality light transport simulations including complex visibility and transport phenomena in real time.
Article
Full-text available
Modern Monte-Carlo-based rendering systems still suffer from the computational complexity involved in the generation of noise-free images, making it challenging to synthesize interactive previews. We present a framework suited for rendering such previews of static scenes using a caching technique that builds upon a linkless octree. Our approach allows for memory-efficient storage and constant-time lookup to cache diffuse illumination at multiple hitpoints along the traced paths. Non-diffuse surfaces are dealt with in a hybrid way in order to reconstruct view-dependent illumination while maintaining interactive frame rates. By evaluating the visual fidelity against ground truth sequences and by benchmarking, we show that our approach compares well to low-noise path-traced results, but with a greatly reduced computational complexity, allowing for interactive frame rates. This way, our caching technique provides a useful tool for global illumination previews and multi-view rendering.
Chapter
Full-text available
Unlike rasterization, where one can rely on pixel quad partial derivatives, an alternative approach must be taken for filtered texturing during ray tracing. We describe two methods for computing texture level of detail for ray tracing. The first approach uses ray differentials, which is a general solution that gives high-quality results. It is rather expensive in terms of computations and ray storage, however. The second method builds on ray cone tracing and uses a single trilinear lookup, a small amount of ray storage, and fewer computations than ray differentials. We explain how ray differentials can be implemented within DirectX Raytracing (DXR) and how to combine them with a G-buffer pass for primary visibility. We present a new method to compute barycentric differentials. In addition, we give previously unpublished details about ray cones and provide a thorough comparison with bilinearly filtered mip level 0, which we consider as a base method.
Chapter
Restricting path tracing to a small number of paths per pixel in order to render images faster rarely achieves a satisfactory image quality for scenes of interest. However, path space filtering may dramatically improve the visual quality by sharing information across vertices of paths classified as proximate. Unlike screen space approaches, these paths neither need to be present on the screen, nor is filtering restricted to the first intersection with the scene. While searching proximate vertices had been more expensive than filtering in screen space, we greatly improve over this performance penalty by storing, updating, and looking up the required information in a hash table. The keys are constructed from jittered and quantized information, such that only a single query very likely replaces costly neighborhood searches. A massively parallel implementation of the algorithm is demonstrated on a graphics processing unit (GPU).KeywordsIntegral equationsReal-time light transport simulationVariance reductionHashingMonte Carlo integrationMassively parallel algorithms
Article
With the push towards physically based rendering, stochastic sampling of shading, e.g. using path tracing, is becoming increasingly important in real-time rendering. To achieve high performance, only low sample counts are viable, which necessitates the use of sophisticated reconstruction filters. Recent research on such filters has shown dramatic improvements in both quality and performance. They exploit the coherence of consecutive frames by reusing temporal information to achieve stable, denoised results. However, existing temporal filters often create objectionable artifacts such as ghosting and lag. We propose a novel temporal filter which analyzes the signal over time to derive adaptive temporal accumulation factors per pixel. It repurposes a subset of the shading budget to sparsely sample and reconstruct the temporal gradient. This allows us to reliably detect sudden changes of the sampled signal and to drop stale history information. We create gradient samples through forward-projection of surface samples from the previous frame into the current frame and by reevaluating the shading samples using the same random sequence. We apply our filter to improve real-time path tracers. Compared to previous work, we show a significant reduction of lag and ghosting as well as improved temporal stability. Our temporal filter runs in 2 ms at 1080p on modern graphics hardware and can be integrated into deferred renderers.
Article
We introduce three new families of stochastic algorithms to generate progressive 2D sample point sequences. This opens a general framework that researchers and practitioners may find useful when developing future sample sequences. Our best sequences have the same low sampling error as the best known sequence (a particular randomization of the Sobol’ (0,2) sequence). The sample points are generated using a simple, diagonally alternating strategy that progressively fills in holes in increasingly fine stratifications. The sequences are progressive (hierarchical): any prefix is well distributed, making them suitable for incremental rendering and adaptive sampling. The first sample family is only jittered in 2D; we call it progressive jittered. It is nearly identical to existing sample sequences. The second family is multi‐jittered: the samples are stratified in both 1D and 2D; we call it progressive multi‐jittered. The third family is stratified in all elementary intervals in base 2, hence we call it progressive multi‐jittered (0,2). We compare sampling error and convergence of our sequences with uniform random, best candidates, randomized quasi‐random sequences (Halton and Sobol'), Ahmed's ART sequences, and Perrier's LDBN sequences. We test the sequences on function integration and in two settings that are typical for computer graphics: pixel sampling and area light sampling. Within this new framework we present variations that generate visually pleasing samples with blue noise spectra, and well‐stratified interleaved multi‐class samples; we also suggest possible future variations.
Conference Paper
We introduce a reconstruction algorithm that generates a temporally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accumulation to increase the effective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter [Dammertz et al. 2010]. This hierarchy allows us to distinguish between noise and detail at multiple scales using local luminance variance. Physically based light transport is a long-standing goal for realtime computer graphics. While modern games use limited forms of ray tracing, physically based Monte Carlo global illumination does not meet their 30 Hz minimal performance requirement. Looking ahead to fully dynamic real-time path tracing, we expect this to only be feasible using a small number of paths per pixel. As such, image reconstruction using low sample counts is key to bringing path tracing to real-time. When compared to prior interactive reconstruction filters, our work gives approximately 10× more temporally stable results, matches reference images 5--47% better (according to SSIM), and runs in just 10 ms (± 15%) on modern graphics hardware at 1920×1080 resolution.
Conference Paper
We introduce a new data structure and algorithms that employ it to compute real-time global illumination from static environments. Light field probes encode a scene's full light field and internal visibility. They extend current radiance and irradiance probe structures with per-texel visibility information similar to a G-buffer and variance shadow map. We apply ideas from screen-space and voxel cone tracing techniques to this data structure to efficiently sample radiance on world space rays, with correct visibility information, directly within pixel and compute shaders. From these primitives, we then design two GPU algorithms to efficiently gather real-time, viewer-dependent global illumination onto both static and dynamic objects. These algorithms make different tradeoffs between performance and accuracy. Supplemental GLSL source code is included.