PreprintPDF Available

Perceptual error optimization for Monte Carlo rendering

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Realistic image synthesis involves computing high-dimensional light transport integrals which in practice are numerically estimated using Monte Carlo integration. The error of this estimation manifests itself in the image as visually displeasing aliasing or noise. To ameliorate this, we develop a theoretical framework for optimizing screen-space error distribution. Our model is flexible and works for arbitrary target error power spectra. We focus on perceptual error optimization by leveraging models of the human visual system's (HVS) point spread function (PSF) from halftoning literature. This results in a specific optimization problem whose solution distributes the error as visually pleasing blue noise in image space. We develop a set of algorithms that provide a trade-off between quality and speed, showing substantial improvements over prior state of the art. We perform evaluations using both quantitative and perceptual error metrics to support our analysis, and provide extensive supplemental material to help evaluate the perceptual improvements achieved by our methods.
Content may be subject to copyright.
Perceptual error optimization for Monte Carlo rendering
VASSILLEN CHIZHOV,MIA Group, Saarland University, Max-Planck-Institut für Informatik, Germany
ILIYAN GEORGIEV,Autodesk, United Kingdom
KAROL MYSZKOWSKI,Max-Planck-Institut für Informatik, Germany
GURPRIT SINGH,Max-Planck-Institut für Informatik, Germany
4-spp average4-spp average Ours w.r.t. surrogateOurs w.r.t. surrogate Ours w.r.t. ground truthOurs w.r.t. ground truth
Tiled error power spectrumTiled error power spectrum
4-spp average4-spp average Ours w.r.t. surrogateOurs w.r.t. surrogate Ours w.r.t. ground truthOurs w.r.t. ground truth
Fig. 1. We devise a perceptually based model to optimize the error of Monte Carlo renderings. Here we show our vertical iterative minimization algorithm
from Section 4.1: Given 4 input samples per pixel (spp), it selects a subset of them to produce an image with substantially improved visual fidelity over a simple
4-spp average. The optimization is guided by a surrogate image obtained by regularizing the noisy input; we also show using the ground-truth image as a guide.
The power spectrum of the image error, computed on 32
×
32-pixel tiles, indicates that our method distributes pixel error with locally blue-noise characteristics.
Synthesizing realistic images involves computing high-dimensional light-
transport integrals. In practice, these integrals are numerically estimated
via Monte Carlo integration. The error of this estimation manifests itself
as conspicuous aliasing or noise. To ameliorate such artifacts and improve
image delity, we propose a perception-oriented framework to optimize the
error of Monte Carlo rendering. We leverage models based on human per-
ception from the halftoning literature. The result is an optimization problem
whose solution distributes the error as visually pleasing blue noise in image
space. To nd solutions, we present a set of algorithms that provide varying
trade-os between quality and speed, showing substantial improvements
over prior state of the art. We perform evaluations using quantitative and
error metrics, and provide extensive supplemental material to demonstrate
the perceptual improvements achieved by our methods.
CCS Concepts:
Computing methodologies Ray tracing
;Image pro-
cessing.
Additional Key Words and Phrases: Monte Carlo, rendering, sampling, per-
ceptual error, blue noise, halftoning, dithering, error diusion
ACM Reference Format:
Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh.
2022. Perceptual error optimization for Monte Carlo rendering. ACM Trans.
Graph. 41, 3, Article 26 (June 2022), 17 pages. https://doi.org/10.1145/3504002
Authors’ addresses: Vassillen Chizhov, MIA Group, Saarland University, and Max-
Planck-Institut für Informatik, Saarbrücken, Germany; Iliyan Georgiev, Autodesk,
United Kingdom; Karol Myszkowski, Max-Planck-Institut für Informatik, Saarbrücken,
Germany; Gurprit Singh, Max-Planck-Institut für Informatik, Saarbrücken, Germany.
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
This is the author’s version of the work. It is posted here for your personal use. Not for
redistribution. The denitive Version of Record was published in ACM Transactions on
Graphics,https://doi.org/10.1145/3504002.
1 INTRODUCTION
Monte Carlo sampling produces approximation error. In rendering,
this error can cause visually displeasing image artifacts, unless con-
trol is exerted over the correlation of the individual pixel estimates.
A standard approach is to decorrelate these estimates by random-
izing the samples independently for every pixel, turning potential
structured artifacts into white noise.
In digital halftoning, the error induced by quantizing continuous-
tone images has been studied extensively. Such studies have shown
that a blue-noise distribution of the quantization error is perceptu-
ally optimal [Ulichney 1987], achieving substantially higher image
delity than a white-noise distribution. Recent works have proposed
empirical means to transfer these ideas to image synthesis [Georgiev
and Fajardo 2016;Heitz and Belcour 2019;Heitz et al
.
2019;Ahmed
and Wonka 2020]. Instead of randomizing the pixel estimates, these
methods introduce negative correlation between neighboring pixels,
exploiting the local smoothness in images to push the estimation
error to the high-frequency spectral range.
We propose a theoretical formulation of perceptual error for im-
age synthesis which unies prior methods in a common framework
and formally justies the desire for blue-noise error distribution. We
extend the comparatively simpler problem of digital halftoning [Lau
and Arce 2007] where the ground-truth image is given, to the sub-
stantially more complex one of rendering where the ground truth is
the sought result and thus unavailable. Our formulation bridges the
gap between multi-tone halftoning and rendering by interpreting
Monte Carlo estimates for a pixel as its admissible ‘quantization
levels’. This insight allows virtually any halftoning method to be
adapted to rendering. We demonstrate this for the three main classes
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
arXiv:2012.02344v6 [cs.GR] 5 Apr 2022
26:2 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
of halftoning algorithms: dither-mask halftoning, error diusion
halftoning, and iterative energy minimization halftoning.
Existing methods [Georgiev and Fajardo 2016;Heitz and Bel-
cour 2019;Heitz et al
.
2019] can be seen as variants of dither-mask
halftoning. They distribute pixel error according to masks that are
optimized w.r.t. a target kernel, typically a Gaussian. The kernel can
be interpreted as an approximation to the human visual system’s
point spread function [Daly 1987;Pappas and Neuho 1999]. We
revisit the kernel-based perceptual model from halftoning [Sullivan
et al
.
1991;Analoui and Allebach 1992;Pappas and Neuho 1999]
and adapt it to rendering. The resulting energy can be directly used
for optimizing Monte Carlo error distribution without the need for a
mask. This formulation help us expose the underlying assumptions
of existing methods and quantify their limitations. In summary:
We formulate an optimization problem for rendering error by
leveraging kernel-based perceptual models from halftoning.
Our formulation unies prior blue-noise error distribution meth-
ods and makes all their assumptions explicit, outlining general
guidelines for devising new methods in a principled manner.
Unlike prior methods, our formulation simultaneously optimizes
for both the magnitude and the image distribution of pixel error.
We devise four dierent practical algorithms based on iterative
minimization, error diusion, and dithering from halftoning.
We demonstrate substantial visual improvements over prior art,
while using the same input rendering data.
2 RELATED WORK
Our work focuses on reducing and optimizing the distribution of
Monte Carlo pixel-estimation error. In this section we review prior
work with similar goals in digital halftoning (Section 2.1) and image
synthesis guided by energy-based (Section 2.2) and perception-based
(Section 2.3) error metrics. We achieve error reduction through care-
ful sample placement and processing, and discuss related rendering
approaches (Section 2.4).
2.1 Digital haloning
Digital halftoning [Lau and Arce 2007] involves creating the illu-
sion of continuous-tone images through the arrangement of binary
elements; various algorithms target dierent display devices. Bayer
[1973] developed the widely used dispersed-dot ordered-dither pat-
terns. Allebach and Liu [1976] introduced the use of randomness in
clustered-dot ordered dithering. Ulichney [1987] introduced blue-
noise patterns that yield better perceptual quality, and Mitsa and
Parker [1991] mimicked those patterns to produce dither arrays (i.e.,
masks) with high-frequency characteristics. Sullivan et al
.
[1991]
developed a Fourier-domain energy function to obtain visually opti-
mal halftone patterns; the optimality is dened w.r.t. computational
models of the human visual system. Analoui and Allebach [1992]
devised a practical algorithm for blue-noise dithering through a
spatial-domain interpretation of Sullivan et al
.
’s model. Their ap-
proach was later rened by Pappas and Neuho [1999].
The void-and-cluster algorithm [Ulichney 1993] uses a Gaussian
kernel to create dither masks with isotropic blue-noise distribu-
tion. This approach has motivated various structure-aware halfton-
ing algorithms in graphics [Ostromoukhov 2001;Pang et al
.
2008;
Chang et al
.
2009]. In the present work, we leverage the kernel-based
model [Analoui and Allebach 1992;Pappas and Neuho 1999] in
the context of Monte Carlo rendering [Kajiya 1986].
2.2 antitative error assessment in rendering
It is convenient to measure the error of a rendered image as a single
value; vector norms like the mean squared error (MSE) are most
commonly used. However, it is widely acknowledged that such sim-
ple metrics do not accurately reect visual quality as they ignore the
perceptually important spatial arrangement of pixels. Various theo-
retical frameworks have been developed in the spatial [Niederreiter
1992;Kuipers and Niederreiter 1974] and Fourier [Singh et al
.
2019]
domains to understand the error reported through these metrics.
The error spectrum ensemble [Celarek et al
.
2019] measures the
frequency-space distribution of the error.
Many denoising methods [Zwicker et al
.
2015] employ the afore-
mentioned metrics to obtain noise-free results from noisy renderings.
Even if the most advanced denoising techniques driven by such met-
rics can eciently steer adaptive sampling [Chaitanya et al
.
2017;
Kuznetsov et al
.
2018;Kaplanyan et al
.
2019], they locally determine
the number of samples per pixel, ignoring the aspect of their specic
layout in screen space.
Our optimization framework employs a perceptual MSE-based
metric that accounts for both the magnitude and the spatial distri-
bution of pixel-estimation error. We argue that the spatial sample
layout plays a crucial role in the perception of a rendered image;
the most commonly used error metrics do not capture this aspect.
2.3 Perceptual error assessment in rendering
The study of the human visual system (HVS) is still ongoing, and well
understood are mostly the early stages of the visual pathways from
the eye optics, through the retina, to the visual cortex. This limits
the scope of existing HVS computational models used in imaging
and graphics. Such models should additionally be computationally
ecient and generalize over the simplistic stimuli that have been
used in their derivation through psychophysical experiments.
Contrast sensitivity function. The contrast sensitivity function
(CSF) is one of the core HVS models that fullls the above con-
ditions and comprehensively characterizes overall optical [Wes-
theimer 1986;Deeley et al
.
1991] and neural [Souza et al
.
2011]
processes in detecting contrast visibility as a function of spatial
frequency. While originally modeled as a band-pass lter [Barten
1999;Daly 1992], the CSF’s shape changes towards a low-pass lter
with retinal eccentricity [Robson and Graham 1981;Peli et al
.
1991]
and reduced luminance adaptation in scotopic and mesopic levels
[Wuerger et al
.
2020]. Low-pass characteristics are also inherent
for chromatic CSFs [Mullen 1985;Wuerger et al
.
2020;Bolin and
Meyer 1998]. In many practical imaging applications, e.g., JPEG com-
pression [Rashid et al
.
2005], rendering [Ramasubramanian et al
.
1999], or halftoning [Pappas and Neuho 1999], the CSF is modeled
as a low-pass lter, which also allows for better control of image
intensity. By normalizing such a CSF by the maximum contrast-
sensitivity value, a unitless function akin to the modulation transfer
function (MTF) can be derived [Daly 1987;Mannos and Sakrison
1974;Mantiuk et al
.
2005;Sullivan et al
.
1991;Souza et al
.
2011] that
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:3
after transforming from the frequency to the spatial domain results
in the point spread function (PSF) [Analoui and Allebach 1992;Pap-
pas and Neuho 1999]. Following Pappas and Neuho [1999], we
approximate such a PSF by a Gaussian lter; the resulting error is
practically negligible for a pixel density of 300 dots per inch (dpi)
and observer-to-screen distance larger than 60 cm.
Advanced quality metrics. More costly, and often less robust, mod-
eling of the HVS beyond the CSF is performed in advanced quality
metrics [Lubin 1995;Daly 1992;Mantiuk et al
.
2011]. Such metrics
have been adapted to rendering to guide the computation to image
regions where the visual error is most strongly perceived [Bolin
and Meyer 1995,1998;Ramasubramanian et al
.
1999;Ferwerda et al
.
1996;Myszkowski 1998;Volevich et al
.
2000]. An important applica-
tion is visible noise reduction in path tracing via content-adaptive
sample-density control [Bolin and Meyer 1995,1998;Ramasubrama-
nian et al
.
1999]. Our framework enables signicant reduction of
noise visibility for the same sampling budget.
2.4 Blue-noise error distribution in rendering
Mitchell [1991] rst observed that high-frequency error distribution
is desirable for stochastic rendering. Only recently, Georgiev and
Fajardo [2016] adopted techniques from halftoning to correlate pixel
samples in screen space and distribute path-tracing error as blue
noise, with substantial perceptual quality improvements. Heitz et al
.
[2019] built on this idea to develop a progressive quasi-Monte Carlo
sampler that further improves quality. Ahmed and Wonka [2020]
proposed a technique to coordinate quasi-Monte Carlo samples in
screen space inspired by error diusion.
Motivated by the results of Georgiev and Fajardo [2016], Heitz
and Belcour [2019] devised a method to directly optimize the distri-
bution of pixel estimates, without operating on individual samples.
Their pixel permutation strategy ts the initially white-noise pixel
intensities to a prescribed blue-noise mask. This approach scales
well with sample count and dimension, though its reliance on prior
pixel estimates makes it practical only for animation rendering
where it is susceptible to quality degradation.
We propose a perceptual error framework that unies these two
general approaches, exposing the assumptions of existing methods
and providing guidelines to alleviate some of their drawbacks.
3 PERCEPTUAL ERROR MODEL
Our aim is to produce Monte Carlo renderings that, at a xed sam-
pling rate, are perceptually as close to the ground truth as possible.
This goal requires formalizing the perceptual image error along
with an optimization problem that minimizes it. In this section, we
build a perceptual model upon the extensive studies done in the
halftoning literature. We will discuss how to eciently solve the
resulting optimization problem in Section 4.
Given a ground-truth image
𝐼
𝐼
𝐼
and its quantized or stochastic
approximation 𝑄
𝑄
𝑄, we denote the (signed) error image by
𝜖
𝜖
𝜖=𝑄
𝑄
𝑄𝐼
𝐼
𝐼 . (1)
To minimize the error, it is convenient to quantify it as a single
value. A common approach is to take the
1
,
2
, or
norm of the
Image Image spectrum Kernel spectrum Product spectrum
𝜖
𝜖
𝜖wˆ
𝜖
𝜖
𝜖w2ˆ
𝑔
𝑔
𝑔2ˆ
𝑔
𝑔
𝑔2ˆ
𝜖
𝜖
𝜖w2
𝜖
𝜖
𝜖bˆ
𝜖
𝜖
𝜖b2ˆ
𝑔
𝑔
𝑔2ˆ
𝑔
𝑔
𝑔2ˆ
𝜖
𝜖
𝜖b2
Fig. 2. Error images
𝜖
𝜖
𝜖w
and
𝜖
𝜖
𝜖b
with respective white-noise,
ˆ
𝜖
𝜖
𝜖w2
, and blue-
noise,
ˆ
𝜖
𝜖
𝜖b2
, power spectra. For a low-pass kernel
𝑔
𝑔
𝑔
modeling the PSF of the
HVS (here a Gaussian with std. dev.
𝜎=
1), the product of its spectrum
ˆ
𝑔
𝑔
𝑔2
with
ˆ
𝜖
𝜖
𝜖b2
has lower magnitude than the product with
ˆ
𝜖
𝜖
𝜖w2
. This corre-
sponds to lower perceptual sensitivity to
𝜖
𝜖
𝜖b
, even though
𝜖
𝜖
𝜖w
has the same
amplitude as it is obtained by randomly permuting the pixels of 𝜖
𝜖
𝜖b.
image
𝜖
𝜖
𝜖
interpreted as a vector. Such simple metrics are permutation-
invariant, i.e., they account for the magnitudes of individual pixel
errors but not for their distribution over the image. This distribu-
tion is an important factor for the perceived delity, since contrast
perception is an inherently spatial characteristic of the HVS (Sec-
tion 2.3). Our model is based on perceptual halftoning metrics that
capture both the magnitude and the distribution of error.
3.1 Motivation
Halftoning metrics model the processing done by the HVS as a
convolution of the error image 𝜖
𝜖
𝜖with a kernel 𝑔
𝑔
𝑔:
𝐸=𝑔
𝑔
𝑔𝜖
𝜖
𝜖2
2=ˆ
𝑔
𝑔
𝑔ˆ
𝜖
𝜖
𝜖2
2=ˆ
𝑔
𝑔
𝑔2
,ˆ
𝜖
𝜖
𝜖2.(2)
The convolution is equivalent to the element-wise product of the
corresponding Fourier spectra
ˆ
𝑔
𝑔
𝑔
and
ˆ
𝜖
𝜖
𝜖
, whose 2-norm in turn equals
the inner product of the power spectra images
ˆ
𝑔
𝑔
𝑔2
and
ˆ
𝜖
𝜖
𝜖2
.Sullivan
et al
.
[1991] optimized the error image
𝜖
𝜖
𝜖
to minimize the error
(2)
w.r.t. a kernel
𝑔
𝑔
𝑔
that approximates the HVS’s modulation transfer
function
ˆ
𝑔
𝑔
𝑔
(MTF) [Daly 1987]. Analoui and Allebach [1992] used
a similar model in the spatial domain with a kernel that approxi-
mates the PSF
1
of the human eye. That kernel is low-pass, and the
optimization naturally yields blue-noise
2
distribution in the error
image [Analoui and Allebach 1992], as we show later in Fig. 5. The
blue-noise distribution can thus be seen as byproduct of the opti-
mization which pushes the spectral components of the error to the
frequencies least visible to the human eye (see Fig. 2).
To better understand the spatial aspects of contrast sensitivity
in the HVS, the MTF is usually modeled over a range of viewing
distances [Daly 1992]. This is done to account for the fact that with
increasing viewer distance, spatial frequencies in the image are
1The MTF is the magnitude of the Fourier transform of the PSF.
2
The term “blue noise” is often used loosely to refer to any isotropic spectrum with
minimal low-frequency content and no concentrated energy spikes.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:4 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
𝜎=0𝜎=0.25 𝜎=0.5𝜎=1
Fig. 3. The appearance of blue noise (le images) converges to a constant im-
age faster than white noise (right images) with increasing observer
distance
,
here emulated via the standard deviation
𝜎
of a Gaussian kernel. We provide
a formal connection between 𝜎and the viewing distance in Section 6.
projected to higher spatial frequencies onto the retina. These fre-
quencies eventually become invisible, ltered out by the PSF which
expands its corresponding kernel in image space. We recreate this
experiment to see the impact of distance on the image error. In Fig. 3,
we convolve white- and blue-noise distributions with a Gaussian
kernel of increasing standard deviation corresponding to increasing
observer-to-screen distance. The high-frequency blue-noise distribu-
tion reaches a homogeneous state (where the tone appears constant)
faster compared to the all-frequency white noise. This means that
high-frequency error becomes indiscernible at closer viewing dis-
tances, where the HVS ideally has not yet started ltering out actual
image detail which is typically low- to mid-frequency. In Section 6
we discuss how the kernel’s standard deviation encodes the viewing
distance w.r.t. to the screen resolution.
3.2 Our model
In rendering, the value of each pixel
𝑖
is a light-transport integral.
Point-sampling its integrand with a sample set
𝑆
yields a pixel
estimate
𝑄𝑖(𝑆)
. The signed pixel error is thus a function of the
sample set:
𝜖𝑖(𝑆)=𝑄𝑖(𝑆)𝐼𝑖
, where
𝐼𝑖
is the reference (i.e., ground-
truth) pixel value. The error of the entire image can be written as
𝜖
𝜖
𝜖(𝑆
𝑆
𝑆)=𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)𝐼
𝐼
𝐼, (3)
where
𝑆
𝑆
𝑆={𝑆1, . . . , 𝑆𝑁}
is an “image” containing the sample set for
all
𝑁
pixels. With these denitions, we can express the perceptual
error in Eq. (2) for the case of Monte Carlo rendering as a function
of the sample-set image 𝑆
𝑆
𝑆, given a kernel 𝑔
𝑔
𝑔:
𝐸(𝑆
𝑆
𝑆)=𝑔
𝑔
𝑔𝜖
𝜖
𝜖(𝑆
𝑆
𝑆)2
2.(4)
Our goal is to minimize the perceptual error
(4)
. We formulate
this task as an optimization problem:
min
𝑆
𝑆
𝑆Ω
Ω
Ω𝐸(𝑆
𝑆
𝑆)=min
𝑆
𝑆
𝑆Ω
Ω
Ω𝑔
𝑔
𝑔(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)𝐼
𝐼
𝐼)2
2.(5)
The minimizing sample-set image
𝑆
𝑆
𝑆
yields an image estimate
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
that is closest to the reference
𝐼
𝐼
𝐼
w.r.t. the kernel
𝑔
𝑔
𝑔
. The search space
Ω
Ω
Ω
is the set of all possible locations for every sample of every pixel.
The total number of samples in
𝑆
𝑆
𝑆
is typically bounded by a given
target sampling budget. Practical considerations may also restrict
the search space Ω
Ω
Ω, as we will exemplify in the following section.
Note that the classical MSE metric corresponds to using a zero-
width (i.e., one-pixel) kernel
𝑔
𝑔
𝑔
in Eq. (4). However, the MSE accounts
only for the magnitude of the error
𝜖
𝜖
𝜖
, while using wider kernels
(such as the PSF) accounts for both magnitude and distribution. Con-
sequently, while the MSE can be minimized by optimizing pixels
independently, minimizing the perceptual error requires coordina-
tion between pixels. In the following section, we devise strategies
for solving this optimization problem.
4 DISCRETE OPTIMIZATION
In our optimization problem
(5)
, the search space for each sample
in every pixel is a high-dimensional unit hypercube. Every point
in this so-called primary sample space maps to a light-transport
path in the scene [Pharr et al
.
2016]. Optimizing for the sample-set
image
𝑆
𝑆
𝑆
thus entails evaluating the contributions
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
of all corre-
sponding paths. This evaluation is costly, and for any non-trivial
scene,
𝑄
𝑄
𝑄
is a function with complex shape and many discontinuities.
This precludes us from studying all (uncountably innite) sample
locations in practice.
To make the problem tractable, we restrict the search in each
pixel to a nite number of (pre-dened) sample sets. We devise
two variants of the resulting discrete optimization problem, which
dier in their denition of the global search space
Ω
Ω
Ω
. In the rst
variant, each pixel has a separate list of sample sets to choose from
(“vertical” search space). The setting is similar to that of (multi-
tone) halftoning [Lau and Arce 2007], which allows us to import
classical optimization techniques from that eld, such as iterative
minimization, error diusion, and mask-based dithering. In the sec-
ond variant, each pixel has one associated sample set, and the search
space comprises permutations of these assignments (“horizontal”
search space). We develop a greedy iterative optimization method
for this second variant.
In contrast to halftoning, in our setting the ground-truth image
𝐼
𝐼
𝐼
—required to compute the error image
𝜖
𝜖
𝜖
during optimization—is
not readily available. Below we describe our algorithms assuming
the ground truth is available; in Section 5we will discuss how to
substitute it with a surrogate to make the algorithms practical.
4.1 Vertical search space
Our rst variant considers a “vertical” search space where the sample
set for each of the 𝑁image pixels is one of 𝑀given sets:3
Ω
Ω
Ω={𝑆
𝑆
𝑆={𝑆1, . . . , 𝑆𝑁}𝑆𝑖{𝑆𝑖,1, . . . , 𝑆𝑖,𝑀 }}.(6)
The objective is to nd a sample set
𝑆𝑖
for every pixel
𝑖
such that all
resulting pixel estimates together minimize the perceptual error
(4)
.
𝑂1
𝑂1
𝑄1,𝑀
𝑄1,𝑀
𝑂2
𝑂2
𝑄2,1
𝑄2,1
𝑂3
𝑂3
𝑄3,2
𝑄3,2
This is equivalent to directly
optimizing over the
𝑀
pos-
sible estimates
𝑄𝑖,1, . . . , 𝑄𝑖 ,𝑀
for each pixel, with
𝑄𝑖, 𝑗 =
𝑄𝑖(𝑆𝑖,𝑗 )
. These estimates can
be obtained by pre-rendering
a stack of
𝑀
images
𝑄
𝑄
𝑄𝑗={𝑄1,𝑗 , . . . , 𝑄 𝑁, 𝑗 }
, for
𝑗=
1
..𝑀
. The result-
ing minimization problem reads:
min
𝑂
𝑂
𝑂𝑂𝑖{𝑄𝑖,1,...,𝑄𝑖 ,𝑀 }𝑔
𝑔
𝑔(𝑂
𝑂
𝑂𝐼
𝐼
𝐼)2
2.(7)
3
For notational simplicity, and without loss of generality, we assume that the number
of candidate sample sets 𝑀is the same for all pixels; in practice can vary per pixel.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:5
This problem is almost identical to that of multi-tone halftoning.
The dierence is that in our setting the “quantization levels”, i.e.,
the pixel estimates, are distributed non-uniformly and vary per
pixel as they are not xed but are the result of point-sampling a
light-transport integral. This similarity allows us to directly apply
existing optimization techniques from halftoning. We consider three
such methods, which we outline in Alg. 1and describe next.
Iterative minimization. State-of-the-art halftoning methods attack
the problem
(7)
directly via greedy iterative minimization [Analoui
and Allebach 1992;Pappas and Neuho 1999]. After initializing
every pixel to a random quantization level, we traverse the image in
serpentine order (as is standard practice in halftoning) and for each
pixel choose the level that minimizes the energy. Several full-image
iterations are performed; in our experiments convergence to a local
minimum is achieved within 10–20 iterations.
As a further improvement, the optimization can be terminated
when no pixels are updated within one full iteration, or when the
perceptual-error reduction rate drops below a certain threshold.
Traversing the pixels in random order allows terminating at any
point but converges slightly slower.
Error diusion. A classical halftoning algorithm, error diusion
scans the image pixel by pixel, snapping each reference value to the
closest quantization level and distributing the resulting pixel error
to yet-unprocessed nearby pixels according to a given kernel
𝜅
𝜅
𝜅
. We
use the empirically derived kernel of Floyd and Steinberg [1976]
which has been shown to produce an output that approximately
minimizes Eq. (7) [Hocevar and Niger 2008]. Error diusion is faster
than iterative minimization but yields less optimal solutions.
Dithering. The fastest halftoning approach quantizes pixel values
using thresholds stored in a pre-computed dither mask (or matrix)
[Spaulding et al
.
1997]. For each pixel, the two quantization levels
that tightly envelop the reference value (in terms of brightness) are
found, and one of the two is chosen based on the threshold assigned
to the pixel by the mask.
Dithering can be understood as performing the perceptual error
minimization in two steps. First, an oine optimization encodes the
error distribution optimal for the target kernel
𝑔
𝑔
𝑔
into a mask. Then,
for a given image, the error magnitude is minimized by restricting
the quantization to the two closest levels per pixel, and the mask-
driven choice between them applies the target distribution of error.
4.2 Horizontal search space
We now describe the second, “horizontal” discrete variant of our
minimization formulation
(5)
. It considers a single sample set
𝑆𝑖
assigned to each of the
𝑁
pixels, all represented together as a sample-
set image
𝑆
𝑆
𝑆
. The search space comprises all possible permutations
Π(𝑆
𝑆
𝑆)of these assignments:
Ω
Ω
Ω=Π(𝑆
𝑆
𝑆),with 𝑆
𝑆
𝑆={𝑆1, . . . , 𝑆𝑁}.(8)
The goal is to nd a permutation
𝜋(𝑆
𝑆
𝑆)
that minimizes the perceptual
error (4). The optimization problem (5) thus takes the form
min
𝜋Π(𝑆
𝑆
𝑆)𝑔
𝑔
𝑔(𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼)2
2.(9)
Algorithm 1. Three algorithms to (approximately) solve the vertical search
space optimization problem
(7)
. The output is an image
𝑂
𝑂
𝑂={𝑂1, . . . , 𝑂𝑁}
,
given a reference image
𝐼
𝐼
𝐼
and a stack of initial image estimates
𝑄
𝑄
𝑄1, . . . , 𝑄
𝑄
𝑄𝑀
.
Iterative minimization updates pixels repeatedly, for each selecting the
estimate that minimizes the perceptual error
(4)
. Error diusion quantizes
each pixel to the closest estimate, distributing the error to its neighbors
based on a kernel
𝜅
𝜅
𝜅
. Dithering quantizes each pixel in
𝐼
𝐼
𝐼
based on thresholds
looked up in a dither mask 𝐵
𝐵
𝐵(optimized w.r.t. the kernel 𝑔
𝑔
𝑔).
1:function IterativeMinimization(𝑔
𝑔
𝑔,𝐼
𝐼
𝐼,𝑄
𝑄
𝑄1, ...,𝑄
𝑄
𝑄𝑀,𝑂
𝑂
𝑂,𝑇)
2:𝑂
𝑂
𝑂={𝑄1,rand, . . . , 𝑄 𝑁,rand }Init each pixel to random estimate
3:for 𝑇iterations do
4:for pixel 𝑖=1..𝑁 do E.g., random or serpentine order
5:for estimate 𝑄𝑖, 𝑗 {𝑄𝑖,1, . . . , 𝑄𝑖,𝑀 }do
6:if 𝑂𝑖==𝑄𝑖, 𝑗 reduces 𝑔
𝑔
𝑔(𝑂
𝑂
𝑂𝐼
𝐼
𝐼)2
2then
7:𝑂𝑖=𝑄𝑖, 𝑗 Update estimate
8:function ErrorDiffusion(𝜅
𝜅
𝜅,𝐼
𝐼
𝐼,𝑄
𝑄
𝑄1, ...,𝑄
𝑄
𝑄𝑀,𝑂
𝑂
𝑂)
9:𝑂
𝑂
𝑂=𝐼
𝐼
𝐼Initialize solution to reference
10:for pixel 𝑖=1..𝑁 do E.g., serpentine order
11:𝑂old
𝑖=𝑂𝑖
12:𝑂𝑖arg min𝑄𝑖,𝑗 𝑂old
𝑖𝑄𝑖, 𝑗 2
2
13:𝜖𝑖=𝑂old
𝑖𝑂𝑖ÆDiuse error 𝜖𝑖to yet-unprocessed neighbors
14:for unprocessed pixel 𝑘within support of 𝜅
𝜅
𝜅around 𝑖do
15:𝑂𝑘+=𝜖𝑖𝜅𝑘𝑖
16:function Dithering(𝐵
𝐵
𝐵,𝐼
𝐼
𝐼,𝑄
𝑄
𝑄1, ...,𝑄
𝑄
𝑄𝑀,𝑂
𝑂
𝑂)
17:for pixel 𝑖=1..𝑁 do ÆFind tightest interval [𝑄low
𝑖, 𝑄high
𝑖]
18:𝑄lower
𝑖=arg max𝑄𝑖,𝑗 𝑄𝑖, 𝑗 𝐼𝑖𝑄𝑖,𝑗 containing 𝐼𝑖
19:𝑄upper
𝑖=arg min 𝑄𝑖,𝑗 𝑄𝑖,𝑗 >𝐼𝑖𝑄𝑖, 𝑗
20:if 𝐼𝑖𝑄lower
𝑖<𝐵𝑖𝑄upper
𝑖𝑄low
𝑖then
21:𝑂𝑖=𝑄lower
𝑖
Ä
Set 𝑂𝑖to 𝑄lower
𝑖or 𝑄upper
𝑖using threshold 𝐵𝑖
22:else
23:𝑂𝑖=𝑄upper
𝑖
We can explore the permutation space
Π(𝑆
𝑆
𝑆)
by swapping the sample-
set assignments between pixels. The minimization requires
updating the image estimate
𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))
for each
permutation
𝜋(𝑆
𝑆
𝑆)
, i.e., after every swap. Such
updates are costly as they involve re-sampling
both pixels in each of potentially millions of
swaps. We need to eliminate these extra ren-
dering invocations during the optimization to
make it practical. To that end, we observe that
for pixels solving similar light-transport integrals, swapping their
sample sets gives a similar result to swapping their estimates. We
therefore restrict the search space to permutations that can be gen-
erated through swaps between such (similar) pixels. This enables an
ecient optimization scheme that directly swaps the pixel estimates
of an initial rendering 𝑄
𝑄
𝑄(𝑆
𝑆
𝑆).
Error decomposition. Formally, we express the estimate produced
by a sample-set permutation in terms of permuting the pixels of the
initial rendering:
𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))=𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))+Δ
Δ
Δ(𝜋)
. The error
Δ
Δ
Δ
is zero
when the swapped pixels solve the same integral. Substituting into
Eq. (9), we can approximate the perceptual error by (see Appendix A)
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:6 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
𝐸(𝜋)=𝑔
𝑔
𝑔(𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼+Δ
Δ
Δ(𝜋))2
2(10a)
𝑔
𝑔
𝑔(𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼)2
2+𝑔
𝑔
𝑔2
1
𝑖
𝑑(𝑖, 𝜋 (𝑖))=𝐸𝑑(𝜋),(10b)
where we write the error
𝐸(𝜋)
as a function of
𝜋
only, to emphasize
that everything else is xed during the optimization. In the approxi-
mation
𝐸𝑑
, the term
𝑑(𝑖, 𝜋 (𝑖))
measures the dissimilarity between
pixel
𝑖
and the pixel
𝜋(𝑖)
it is relocated to by the permutation. The
purpose of this metric is to predict how dierent we expect the
result of re-estimating the pixels after swapping their sample sets
to be compared to simply swapping their initial estimates. It can be
constructed based on knowledge or assumptions about the image.
Local similarity assumption. Our implementation uses a simple
binary dissimilarity function that returns zero when
𝑖
and
𝜋(𝑖)
are
within some distance
𝑟
and innity otherwise. We set
𝑟[
1
,
3
]
; it
should ideally be locally adapted to the image smoothness. This
allows us to restrict the search space
Π(𝑆
𝑆
𝑆)
only to permutations that
swap adjacent pixels where it is more likely that
Δ
Δ
Δ
is small. More
elaborate heuristics could better account for pixel (dis)similarity.
Iterative minimization. We devise a greedy iterative minimization
scheme for this horizontal formulation, similar to the one in Alg. 1.
Given an initial image estimate
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
, produced by randomly assign-
ing a sample set to every pixel, our algorithm goes over all pixels
and for each considers swaps within a
(
2
𝑅+
1
)2
neighborhood;
we use
𝑅=
1. The swap that brings the largest reduction in the
perceptual error
𝐸𝑑
is accepted. Algorithm 2provides pseudocode.
In our experiments we run
𝑇=
10 full-image iterations. As before,
the algorithm could be terminated based on the swap reduction rate
or the error reduction rate. We explore additional optimizations in
supplemental Section 3.
The parameter 𝑅balances between the cost of one iteration and
the amount of exploration it can do. Note that this parameter is
dierent from the maximal relocation distance
𝑟
in the dissimilarity
metric, with 𝑅𝑟.
Due to the pixel (dis)similarity assumptions, the optimization
can produce some mispredictions, i.e., it may swap the estimates of
pixels for which swapping the sample sets produces a signicantly
dierent result. Thus the image
𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))
cannot be used directly
as a nal estimate. We therefore re-render the image using the
optimized permutation 𝜋to obtain the nal estimate 𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆)).
4.3 Discussion
Search space. We discretize the search space Ω
Ω
Ωto make the opti-
mization problem
(5)
tractable. To make it truly practical, it is also
necessary to avoid repeated image estimation (i.e.,
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
evaluation)
during the search for the solution
𝑆
𝑆
𝑆
. Our vertical
(7)
and horizon-
tal
(9)
optimization variants are formulated specically with this
goal in mind. All methods in Algs. 1and 2operate on pre-generated
image estimates that constitute the solution search space.
Our vertical formulation takes a collection of
𝑀
input estimates
{𝑄𝑖,𝑗 =𝑄𝑖(𝑆𝑖, 𝑗 )}𝑀
𝑗=1
for every pixel
𝑖
, one for each sample set
𝑆𝑖, 𝑗
.
Noting that
𝑄𝑖, 𝑗
are MC estimates of the true pixel value, this col-
lection can be cheaply expanded to a size as large as 2
𝑀
1by
Algorithm 2. Given a convolution kernel
𝑔
𝑔
𝑔
, a reference image
𝐼
𝐼
𝐼
, an initial
sample-set assignment
𝑆
𝑆
𝑆
, and an image estimate
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
computed with
that assignment, our greedy algorithm iteratively swaps sample-set assign-
ments between neighboring pixels to minimize the perceptual error
𝐸𝑑(10b)
,
producing a permutation 𝜋of the initial assignment.
1:function IterativeMinimization(𝑔
𝑔
𝑔,𝐼
𝐼
𝐼,𝑆
𝑆
𝑆,𝑄
𝑄
𝑄(𝑆
𝑆
𝑆),𝑇,𝑅,𝜋)
2:𝜋=identity permutation Initialize solution permutation
3:for 𝑇iterations do
4:for pixel 𝑖=1..𝑁 do E.g., random or serpentine order
5:𝜋=𝜋ÆFind best pixel in neighborhood to swap with
6:for pixel 𝑗in (2𝑅+1)2neighborhood around 𝑖do
7:if 𝐸𝑑(𝜋𝑖𝑗(𝑆
𝑆
𝑆))<𝐸𝑑(𝜋(𝑆
𝑆
𝑆))then Eq. (10b)
8:𝜋=𝜋𝑖𝑗Accept swap as current best
9:𝜋=𝜋
taking the average of the estimates in each of its subsets (excluding
the empty subset). In practice only a fraction of these subsets can
be used, since the size of the power set grows exponentially with
𝑀
. It may seem that this approach ends up wastefully throwing
away most input estimates. But note that these estimates actively
participate in the optimization and provide the space of possible
solutions. Carefully selecting a subset per pixel can yield a higher-
delity result than blindly averaging all available estimates, as we
will show repeatedly in Section 7.
In contrast, our horizontal formulation builds a search space given
just a single input estimate
𝑄𝑖
per pixel. We consciously restrict
the space to permutations between nearby pixels, so as to leverage
local pixel similarity and avoid repeated pixel evaluation during
optimization. The disadvantage of this approach is that it requires re-
rendering the image after optimization, with uncertain results (due
to mispredictions) that can lead to local degradation of image quality.
Mispredictions can be reduced by exploiting knowledge about the
rendering function
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
, e.g., through depth, normal, or texture
buers; we explore this in supplemental Section 2. Additionally,
while methods like iterative minimization (Alg. 2) and dithering
(Section 5.2) can be adapted to this search space, reformulating other
halftoning algorithms such as error diusion is non-trivial.
A hybrid formulation is also conceivable, taking a single input
estimate per pixel (like horizontal methods) and considering a sepa-
rate (vertical) search space for each pixel constructed by borrowing
estimates from neighboring pixels. Such an approach could benet
from advanced halftoning optimization methods, but could also
suer from mispredictions and require re-rendering. We leave the
exploration of this approach to future work.
Finally, it is worth noting that discretization is not the only route
to practicality. Equation (5) can be optimized on the continuous
space
Ω
Ω
Ω
if some cheap-to-evaluate proxy for the rendering function
is available. Such a continuous approximation may be analytical
(based on prior knowledge or assumptions) or obtained by recon-
structing a point-wise evaluation. However, continuous-space opti-
mization can be dicult in high dimensions (e.g., number of light
bounces) where non-linearities and non-convexity are exacerbated.
Optimization strategy. Another important choice is the optimiza-
tion method. For the vertical formulation, iterative minimization
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:7
provides the best exibility and quality but is the most computa-
tionally expensive. Error diusion and dithering are faster but only
approximately solve Eq. (7).
One dierence between classical halftoning and our vertical set-
ting is that quantization levels are non-uniformly distributed and
dier between pixels. This further increases the gap in quality be-
tween the image-adaptive iterative minimization and error diu-
sion (which can correct for these dierences) and the non-adaptive
dithering, compared to the halftoning setting. The main advantage
of dithering is that it involves the kernel
𝑔
𝑔
𝑔
explicitly, while the
error-diusion kernel 𝜅
𝜅
𝜅cannot be related directly to 𝑔
𝑔
𝑔.
5 PRACTICAL APPLICATION
We now turn to the practical use of our error optimization frame-
work. In both our discrete formulations from Section 4, the search
space is determined by a given collection of sample sets
𝑆𝑖, 𝑗
for
every pixel
𝑖
, with
𝑗=
1
...𝑀
(in the horizontal setting
𝑀=
1). The
optimization is then driven by the corresponding estimates
𝑄𝑖, 𝑗
. We
consider two ways to obtain these estimates, leading to dierent
practical trade-os: (1) direct evaluation of the samples by rendering
a given scene and (2) using a proxy for the rendering function. We
show how prior works correspond to using either approach within
our framework, which helps expose their implicit assumptions.
5.1 Surrogate for ground truth
The goal of our optimization is to perceptually match an image
estimate to the ground truth
𝐼
𝐼
𝐼
as closely as possible. Unfortunately,
the ground truth is unknown in our setting, unlike in halftoning.
The best we can do is substitute it with a surrogate image
𝐼
𝐼
𝐼
. Such
an image can be obtained either from available pixel estimates or
by making assumptions about the ground truth. We will discuss
specic approaches in the following, but it is already worth noting
that all existing error-distribution methods rely on such a surrogate,
whether explicitly or implicitly. And since the surrogate guides the
optimization, its delity directly impacts the delity of the output.
5.2 A-posteriori optimization
Given a scene and a viewpoint, initial pixel estimates can be obtained
by invoking the renderer with the input samples:
𝑄𝑖, 𝑗 =𝑄𝑖(𝑆𝑖, 𝑗 )
.
A surrogate can then be constructed from those estimates; in our
implementation we denoise the estimate-average image (Section 7.1).
Having the estimates and the surrogate, we can run any of the
methods in Algs. 1and 2. Vertical algorithms directly output an
image
𝑂
𝑂
𝑂
; horizontal optimization yields a sample-set image
𝑆
𝑆
𝑆
that
requires an additional rendering invocation: 𝑂
𝑂
𝑂=𝑄
𝑄
𝑄(𝑆
𝑆
𝑆).
This general approach of utilizing sampled image information
was coined a-posteriori optimization by Heitz and Belcour [2019];
they proposed two such methods. Their histogram method operates
in a vertical setting, choosing one of the (sorted) estimates for each
pixel based on the respective value in a given blue-noise dither mask.
Such sampling corresponds to using an implicit surrogate that is
the median estimate for every pixel, which is what the mean of the
dither mask maps to. Importantly, any one of the estimates for a pixel
can be selected, whereas in classical dithering the choice is between
the two quantization levels that tightly envelop the reference value
(Section 4.1) [Spaulding et al
.
1997]. Such selection can yield large
error, especially for pixels whose corresponding mask values deviate
strongly from the mask mean. This produces image reies that do
not appear if simple estimate averages are taken instead (see Fig. 6).
The permutation method of Heitz and Belcour [2019] operates in
a horizontal setting. Given an image estimate, it nds pixel permu-
tations within small tiles that minimize the distance between the
estimates and the values of a target blue-noise mask. This matching
transfers the mask’s distribution to the image signal rather than
to its error. The two are equivalent only when the signal within
each tile is constant. The implicit surrogate in this method is thus
a tile-wise constant image (shown more formally in supplemental
Section 5). In our framework the use of a surrogate is explicit, which
enables full control over the quality of the error distribution.
5.3 A-priori optimization
Optimizing perceptual error is possible even in the absence of in-
formation about a specic image. In our framework, the goal of
such an a-priori approach (as coined by Heitz and Belcour [2019]) is
to compute a sample-set image
𝑆
𝑆
𝑆
by using surrogates for both the
ground-truth image
𝐼
𝐼
𝐼
and the rendering function
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
, constructed
based on smoothness assumptions. The samples
𝑆
𝑆
𝑆
can then produce
a high-delity estimate of any image that meets those assumptions.
Lacking prior knowledge, one could postulate that every pixel
𝑖
has the same rendering function:
𝑄𝑖()=𝑄()
; the image surrogate
𝐼
𝐼
𝐼
is thus constant. While in practice this assumption (approximately)
holds only locally, the optimization kernel
𝑔
𝑔
𝑔
is also expected to have
compact support. The shape of
𝑄
can be assumed to be (piecewise)
smooth and approximable by a cheap analytical function 𝑄.
With the above surrogates in place, we can run our algorithms
to optimize a sample-set image
𝑆
𝑆
𝑆
. The constant-image assumption
makes horizontal algorithms well-suited for this setting as it makes
the swapping-error term
Δ
Δ
Δ
in Eq. (10a) vanish, simplifying the per-
ceptual error to
𝐸(𝜋(𝑆
𝑆
𝑆))=𝑔
𝑔
𝑔𝜋(𝜖
𝜖
𝜖(𝑆
𝑆
𝑆))2
2
. This enables the opti-
mization to consider swaps between any two pixels in the error
image
𝜖
𝜖
𝜖(𝑆
𝑆
𝑆)
. That image can be quickly rendered in advance, by in-
voking the render-function surrogate
𝑄
with the input sample-set
image.
Georgiev and Fajardo [2016] take a similar approach, with swap-
ping based on simulated annealing. Their empirically motivated
optimization energy uses an explicit (Gaussian) kernel, but instead
of computing an error image through a rendering surrogate, it pos-
tulates that the distance between two sample sets is representative
of the dierence between their corresponding pixel errors. Such
a smoothness assumption holds for bi-Lipschitz-continuous func-
tions. Their energy can thus be understood to compactly encode
properties of a class of rendering functions.
Heitz et al
.
[2019] adopt the approach of Georgiev and Fajardo
[2016], but their energy function replaces the distance between
sample sets by the dierence between their corresponding pixel
errors. The errors are computed using an explicit render-function
surrogate. They optimize for a large number of simple surrogates
simultaneously, and leverage a compact representation of Sobol
sequences to also support progressive sampling. We relate these two
prior works to ours more formally in supplemental Section 6, also
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:8 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
showing how our perceptual error formulation can be incorporated
into the method of Heitz et al. [2019].
The approach of Ahmed and Wonka [2020] performs on-the-
y scrambling of a Sobol sequence applied to the entire image.
Image pixels are visited in Morton Z-order modied to breaks its
regularity. The resulting sampler diuses Monte Carlo error over
hierarchically nested blocks of pixels giving a perceptually pleasing
error distribution. However, the algorithmic nature of this approach
introduces more implicit assumptions than prior works, making it
dicult to analyze.
Our theoretical formulation and optimization methods enable the
construction of a-priori sampling masks in a principled way. For hor-
izontal optimization, we recommend using our iterative algorithm
(Alg. 2) which can bring signicant performance improvement over
simulated annealing; such speed-up was reported by Analoui and
Allebach [1992] for dither-mask construction. Vertical optimization
is an interesting alternative, where for each pixel one of several
sample sets would be chosen; this would allow for varying the
sample count per pixel. Note that the ranking-key optimization
for progressive sampling of Heitz et al
.
[2019] is a form of vertical
optimization.
5.4 Discussion
Our formulation expresses a-priori and a-posteriori optimization un-
der a common framework that unies existing methods. These two
approaches come with dierent trade-os. A-posteriori optimization
utilizes sampled image information, and in a vertical setting requires
no assumptions except for what is necessary for surrogate construc-
tion. It thus has potential to achieve high output delity, especially
on scenes with complex lighting as it is oblivious to the shape and
dimensionality of the rendering function, as rst demonstrated by
Heitz and Belcour [2019]. However, it requires pre-sampling (also
post-sampling in the horizontal setting), and the optimization is
sensitive to the surrogate quality.
Making aggressive assumptions allows a-priori optimization to
be performed oine once and the produced samples
𝑆
𝑆
𝑆
to be subse-
quently used to render any image. This approach resembles classical
sample stratication where the goal is also to optimize sample dis-
tributions w.r.t. some smoothness expectations. In fact, our a-priori
formulation subsumes the per-pixel stratication problem, since
the perceptual error is minimized when the error image
𝜖
𝜖
𝜖(𝑆
𝑆
𝑆)
has
both low magnitude and visually pleasing distribution. Two main
factors limit the potential of a-priori optimization, especially on
scenes with non-uniform multi-bounce lighting. One is the general
diculty of optimizing sample distributions in high-dimensional
spaces. The other is that in such scenes the complex shape of the
rendering function, both in screen and sample space, can easily
break smoothness assumptions and cause high (perceptual) error.
To test the capabilities of our formulation, in the following we
focus on the a-posteriori approach. In the supplemental document
we explore a-priori optimization based on our framework. The two
approaches can also be combined, e.g., by seeding a-posteriori opti-
mization with a-priori-optimized samples whose good initial guess
can improve the quality and convergence speed.
6 EXTENSIONS
Our perceptual error formulation
(4)
approximates the eect of the
HVS PSF through kernel convolution. In this section we analyze the
relationship between kernel and viewing distance, as well as the
impact of the kernel shape on the error distribution. We also present
extensions that account for the HVS non-linearities in handling
tone and color.
Kernels and viewing distance. As discussed in Section 3.1, the PSF
is usually modelled over a range of viewing distances. The eect of
the PSF depends on the frequencies of the viewed signal and the
distance from which it is viewed. Pappas and Neuho [1999] have
found that the Gaussian is a good approximation to the PSF in the
context of halftoning. They derived its standard deviation
𝜎
in terms
of the minimum viewing distance for a given screen resolution:
𝜎=0.00954
𝜏,where 𝜏=180
𝜋2 arctan 1
2𝑅𝐷 .(11)
Here
𝜏
is the visual angle between the centers of two neighboring
pixels (in degrees) for screen resolution
𝑅
(in 1
inches) and viewing
distance
𝐷
(in inches). The minimum viewing distance for a given
standard deviation and resolution can be contained via the inverse
formula:
𝐷=2𝑅tan 𝜋
180 0.00954
2𝜎1
. Larger
𝜎
values correspond to
larger observer distances; we demonstrate the eect of that in Fig. 3
where the images become increasingly blurrier. In Fig. 4a, we com-
pare that Gaussian kernel to two well-established PSF models from
the halftoning literature [Näsänen 1984;González et al
.
2006]. We
have found the dierences between all three to be negligible; we
use the cheaper to evaluate Gaussian in all our experiments.
Decoupling the viewing distances. Being based on the perceptual
models of the HVS [Sullivan et al
.
1991;Analoui and Allebach 1992],
our formulation
(4)
assumes that the estimate
𝑄
𝑄
𝑄
and the reference
𝐼
𝐼
𝐼
are viewed from the same (range of) distance(s). The two distances
can be decoupled by applying dierent kernels to 𝑄
𝑄
𝑄and 𝐼
𝐼
𝐼:
𝐸=𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼2
2.(12)
Minimizing this error makes
𝑄
𝑄
𝑄
appear from some distance
𝐷𝑔
𝑔
𝑔
similar
to
𝐼
𝐼
𝐼
seen from a dierent distance
𝐷
. The special case of using a
Kronecker delta kernel
=𝛿
𝛿
𝛿
, i.e., with the reference
𝐼
𝐼
𝐼
seen from
up close, yields
𝐸=𝑔
𝑔
𝑔𝑄
𝑄
𝑄𝐼
𝐼
𝐼2
2
. This has been shown to have an
edge enhancing eect [Anastassiou 1989;Pappas and Neuho 1999]
which we show in Fig. 4b. We use
=𝛿
𝛿
𝛿in all our experiments.
Tone mapping. Considering that the optimized image will be
viewed on media with limited dynamic range (e.g., screen or paper),
we can incorporate a tone-mapping operator
𝒯
into the perceptual
error (4):
𝐸=𝑔
𝑔
𝑔𝜖
𝜖
𝜖𝒯2
2=𝑔
𝑔
𝑔(𝒯(𝑄
𝑄
𝑄)𝒯(𝐼
𝐼
𝐼))2
2.(13)
Doing this also bounds the per-pixel error
𝜖
𝜖
𝜖𝒯=𝒯(𝑄
𝑄
𝑄)𝒯(𝐼
𝐼
𝐼)
,
suppressing outliers and making the optimization more robust in
scenes with high dynamic range. We illustrate this improvement in
Fig. 4c, where an ACES [Arrighetti 2017] tone-mapping operator
is applied to the optimized image. Optimizing w.r.t. the original
perceptual error
(4)
yields a noisy and overly dark image compared
to the tone-mapped ground truth. Accounting for tone mapping in
the optimization through Eq. (13) yields a more faithful result.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:9
[Näsänen 1984][Näsänen 1984] [González et al. 2006][González et al. 2006]
Our kernelOur kernel
=𝑔
𝑔
𝑔
=𝑔
𝑔
𝑔
=𝛿
𝛿
𝛿
=𝛿
𝛿
𝛿Linear errorLinear error Tone-mapped errorTone-mapped error
Ground truthGround truth
Grayscale errorGrayscale error Color errorColor error
Ground truthGround truth
(a) Kernel comparison (b) Kernel sharpening eect (c) Tone mapping (ACES) (d) Color handling
Fig. 4. (a) Our binomial Gaussian approximation
𝑔
𝑔
𝑔
(3
×
3pixels,
𝜎=2𝜋
) performs on par with state-of-the-art haloning kernels. (b) Seing the reference-image
kernel
in Eq. (12) to a zero-width 𝛿
𝛿
𝛿kernel sharpens the output. (c) Incorporating tone mapping via Eq. (13). (d) Incorporating color via Eq. (14).
Input (white noise) Low-pass (blue noise) Band-stop (green noise) High-pass (red noise) Band-pass (violet noise) Low-pass anisotropic Spatially varying
Fig. 5. Our formulation
(5)
allows optimizing the error distribution of an image w.r.t. arbitrary kernels. Here we adapt our horizontal iterative minimization (Alg. 2)
to directly swap the pixels of a white-noise input image. Insets show the power spectra of the target kernels (top le) and the optimized images (boom right).
Color handling. While the HVS reacts more strongly to luminance
than color, ignoring chromaticity entirely (e.g., by computing the
error image
𝜖
𝜖
𝜖
from per-pixel luminances) can have a negative eect
on the distribution of color noise in the image. To that end, we can
penalize the perceptual error of each color channel
𝑐𝐶
separately:
𝐸=
𝑐𝐶
𝜆𝑐𝑔
𝑔
𝑔𝑐(𝑄
𝑄
𝑄𝑐𝐼
𝐼
𝐼𝑐)2
2,(14)
where
𝜆𝑐
is a per-channel weight. In our experiments, we use an
RGB space
𝐶={
r
,
g
,
b
}
, set
𝜆𝑐=
1, and use the same kernel
𝑔
𝑔
𝑔𝑐=𝑔
𝑔
𝑔
for
every channel. Figure 4d shows the improvement in color noise over
using greyscale perceptual error. Depending on the color space, the
per-channel kernels may dier (e.g., YCbCr) [Sullivan et al. 1991].
As an alternative, one could decouple the channels from the input
estimates and optimize each channel separately, assembling the
results into a color image. In a vertical setting, this decoupling
extends the optimization search space size from 𝑀to 𝑀𝐶.
Kernel shape impact. To test the robustness of our framework, we
analyze kernels with spectral characteristics other than isotropic
blue-noise in Fig. 5. We run our iterative pixel-swapping algorithm
(Alg. 2) to optimize the shape of a white-noise input, which pro-
duces a spectral distribution inverse to that of the target kernel. The
rightmost image in the gure shows the result of using a spatially
varying kernel that is a convex combination between a low-pass
Gaussian and a high-pass anisotropic kernel, with the interpolation
parameter varying horizontally across the image. Our algorithm
can adapt the noise shape well.
7 RESULTS
We now present empirical validation of our error optimization frame-
work in the a-posteriori setting described in Section 5.2. We render
static images and animations of several scenes, comparing our algo-
rithms to those of Heitz and Belcour [2019].
7.1 Setup
Perceptual error model. We build a perceptual model by combin-
ing all extensions from Section 6. Our estimate-image kernel
𝑔
𝑔
𝑔
is a
binomial approximation of a Gaussian [Lindeberg 1990]. For per-
formance reasons and to allow smaller viewing distances we use a
3
×
3-pixel kernel with standard deviation
𝜎=2𝜋
(see Fig. 4a).
Plugging this
𝜎
value into the inverse of Eq. (11), the correspond-
ing minimum viewing distance is
𝐷=
4792
𝑅
inches for a screen
resolution of
𝑅
dpi (e.g., 16 inches at 300 dpi). We recommend view-
ing from a larger distance, to reduce the eect of our 3
×
3kernel
discretization. We use a Dirac reference-image kernel:
=𝛿
𝛿
𝛿
, and
incorporate a simple tone-mapping operator
𝒯
that clamps pixel
values to [0,1]. The nal error model reads:
𝐸=
𝑐{r,g,b}
𝑔
𝑔
𝑔𝒯(𝑄
𝑄
𝑄𝑐)𝛿
𝛿
𝛿𝒯(𝐼
𝐼
𝐼
𝑐)2
2,(15)
where
𝐼
𝐼
𝐼
is the surrogate image whose construction we describe
below. For dithering we convert RGB colors to luminance, which
reduces the number of components in the error (15) to one.
Methods. We compare our four methods from Algs. 1and 2to
the histogram and permutation of Heitz and Belcour [2019]. For our
vertical and horizontal iterative minimizations we set the maximum
iteration count to 100 and 10 respectively. For error diusion we
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:10 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
use the kernel of Floyd and Steinberg [1976] and for dithering we
use a void-and-cluster mask [Ulichney 1993]. For our horizontal
iterative minimization we use a search radius
𝑅=
1and allow pixels
to travel within a disk of radius
𝑟=
1from their original location in
the dissimilarity metric. For the permutation method of Heitz and
Belcour [2019] we obtained best results with tile size 8
×
8. (Our
𝑟=
1
approximately corresponds to their tile size 3×3.)
Rendering. All scenes were rendered with PBRT [Pharr et al
.
2016]
using unidirectional or bidirectional path tracing. None of the meth-
ods depend on the sampling dimensionality, though we set the
maximum path depth to 5 for all scenes to maintain reasonable ren-
dering times. The ground-truth images have been generated using
a Sobol sampler with at least 1024 samples per pixel (spp); for all
test renders we use a random sampler. To facilitate numerical-error
comparisons between the dierent methods, we trace the primary
rays through the pixel centers.
Surrogate construction. To build a surrogate image for our meth-
ods, we lter the per-pixel averaged input estimates using Intel Open
Image Denoise [Intel 2018] which also leverages surface-normal and
albedo buers, taking about 0.5 sec for a 512
×
512 image. Recall that
the methods of Heitz and Belcour [2019] utilize implicit surrogates.
Image-quality metrics. We evaluate the quality of some of our re-
sults using the HDR-VDP-2 perceptual metric [Mantiuk et al
.
2011],
with parameters matching our binomial kernel. We compute error-
detection probability maps which indicate the likelihood for a hu-
man observer to notice a dierence from the ground truth.
Additionally, we analyze the local blue-noise quality of the error
image
𝜖
𝜖
𝜖=𝒯(𝑄
𝑄
𝑄)𝒯(𝐼
𝐼
𝐼)
. We split the image into tiles of 32
×
32
pixels and compute the Fourier power spectrum of each tile. For
visualization purposes, we apply a standard logarithmic transform
𝑐ln(
1
+ˆ
𝜖)
to every resulting pixel value
ˆ
𝜖
and compute the nor-
malization factor
𝑐
per tile so that the maximum nal RGB value
within the tile is
(
1
,
1
,
1
)
. Note that the error image
𝜖
𝜖
𝜖
is computed
w.r.t. the ground truth
𝐼
𝐼
𝐼
and not the surrogate, which quanties
the blue-noise distribution objectively. The supplemental material
contains images of the tiled power spectra for all experiments.
We compare images quantitatively via traditional MSE as well
as a metric derived from our perceptual error formulation. Our
perceptual MSE (pMSE) evaluates the error
(15)
of an estimate image
w.r.t. the ground truth, normalized by the number of pixels
𝑁
and
channels
𝐶
:
pMSE =𝐸
𝑁𝐶
. It generalizes the MSE with a perceptual,
i.e., non-delta, kernel 𝑔
𝑔
𝑔. Table 1summarizes the results.
7.2 Rendering comparisons
All methods. Figure 6shows an equal-sample comparison of all
methods. Vertical methods select one of the 4 input samples per
pixel; horizontal methods are fed a 2-spp average for every pixel, and
another 2 spp are used to render the nal image after optimization.
Our methods consistently perform best visually, with the vertical
iterative minimization achieving the lowest perceptual error, as cor-
roborated by the HDR-VDP-2 detection maps. Error diusion is not
far behind in quality and is the fastest of all methods along with
dithering. The latter is similar to Heitz and Belcour’s histogram
method but yields a notably better result thanks to using a superior
surrogate and performing the thresholding as in the classical halfton-
ing setting (see Section 5.2). Horizontal methods perform worse due
to noisier input data (half spp) and worse surrogates derived from
it, and also mispredictions (which necessitate re-rendering). Ours
still uses a better surrogate than Heitz and Belcour’s permutation
and is also able to better t to it. Notice the low delity of the 4-spp
average image compared to our vertical methods’, even though the
latter retain only one of the four input samples for every pixel.
Vertical methods. In Fig. 7we compare our vertical iterative min-
imization to the histogram sampling of Heitz and Belcour [2019].
Both select one of several input samples (i.e., estimates) for each
pixel. Our method produces a notably better result even when given
16
×
fewer samples to choose from. The perceptual error of his-
togram sampling does not vanish with increasing sample count. It
dithers pixel intensity rather than pixel error, thus more samples
help improve the intensity distribution but not the error magnitude.
Figure 1shows our most capable method: vertical iterative mini-
mization with search space extended to the power set of the input
samples (with size 2
4
1
=
15 for 4 input spp; see Section 4.3). We
compare surrogate-driven optimization against the best-case result—
optimization w.r.t. the ground truth. Both achieve high delity, with
little dierence between them and with pronounced local blue-noise
error distribution corroborated by the tiled power spectra.
Horizontal methods & animation. For rendering static images,
horizontal methods are at a disadvantage compared to vertical ones
due to the required post-optimization re-rendering. As Heitz and
Belcour [2019] note, in an animation setting this sampling overhead
can be mitigated by reusing the result of one frame as the initial
estimate for the next. In Fig. 8we compare their permutation method
to our horizontal iterative minimization. For theirs we shift a void-
and-cluster mask in screen space per frame and apply retargeting,
and for ours we traverse the image pixels in dierent random order.
We intentionally keep the scenes static to test the methods’ best-case
abilities to improve the error distribution over frames.
Starting from a random initial estimate, our method can benet
from a progressively improving surrogate that helps ne-tune the er-
ror distribution via localized pixel swaps. The permutation method
operates in greyscale within static non-overlapping tiles. This pre-
vents it from making signicant progress after the rst frame. While
mispredictions cause local deviations from blue noise in both re-
sults, these are stronger in the permutation method’s. This is evident
when comparing the corresponding prediction images—the results
of optimization right before re-rendering. The permutation’s retar-
geting pass breaks the blocky image structure caused by tile-based
optimization but increases the number of mispredictions.
The supplemental video shows animations with all methods,
where vertical ones are fed a new random estimate per frame. Even
without accumulating information over time, these consistently beat
the two horizontal methods. The latter suer from mispredictions
under fast motion and perform similarly to one another, though ours
remains superior in the presence of temporal smoothness. Mispredic-
tions could be eliminated by optimizing frames independently and
splitting the sampling budget into optimization and re-rendering
halves (as in Fig. 6), though at the cost of reduced sampling quality.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:11
Output images
RandomRandom
1 spp1 spp
RandomRandom
4-spp average4-spp average
Time: 0.08 secTime: 0.08se c
Vertical: HistogramVertical: Histogram
[Heitz and Belcour 2019][Heitz and Belcour 2019]
Time: 0.07 secTime: 0.07se c
Horizontal: PermutationHorizontal: Permutation
[Heitz and Belcour 2019][Heitz and Belcour 2019]
Time: 8.30 secTime: 8.30se c
Horizontal: IterativeHorizontal: Iterative
(ours)(ours)
Time: 0.04 secTime: 0.04se c
Vertical: DitheringVertical: Dithering
(ours)(ours)
Time: 0.04 secTime: 0.04se c
Vertical: Error diusionVertical: Error diusion
(ours)(ours)
Time: 15.2secTime: 15.2sec
Vertical: IterativeVertical: Iterative
(ours)(ours)
Output zoom-ins
HDR-VDP-2 error-detection maps
0%0% 100%100%
Fig. 6. Comparison of our algorithms against the permutation and histogram methods of Heitz and Belcour [2019] with equal total sampling cost of 4 spp.
Boom row shows HDR-VPD-2 error-detection maps (blue is beer, i.e., lower detection probability). The baseline 1-spp and 4-spp images exhibit large
perceptual error, while our vertical iterative minimization achieves highest fidelity. Error diusion produces similar quality. Dithering is as fast but shows
smaller improvement over the baselines, yet significantly outperforms the similar histogram method. Our horizontal iterative optimization does beer than
the permutation method. Our methods also reduce MSE compared to the 4-spp baseline, even though they do not focus solely on per-pixel error (see Table 1).
HistogramHistogram
1/16 spp1/16 spp
Iterative (ours)Iterative (ours)
1/4 spp1/4 spp
HistogramHistogram
1/64 spp1/64 spp
4-spp average4-spp average
HistogramHistogram
1/16 spp1/16 spp
Iterative (ours)Iterative (ours)
1/4 spp1/4 spp
HistogramHistogram
1/64 spp1/64 spp
4-spp average4-spp average
HistogramHistogram
1/16 spp1/16 spp
Iterative (ours)Iterative (ours)
1/4 spp1/4 spp
HistogramHistogram
1/64 spp1/64 spp
4-spp average4-spp average
Fig. 7. With a search space of only 4 spp, our vertical iterative minimization outperforms histogram sampling [Heitz and Belcour 2019] with 16
×
more input
samples. Please zoom in to fully appreciate the dierences; the full-size images are included in the supplemental material.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:12 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
Frame 16
PermutationPermutation Iterative (ours)Iterative (ours) PermutationPermutation Iterative (ours)Iterative (ours) PermutationPermutation Iterative (ours)Iterative (ours)
Frame 1Frame 16
Prediction
Fig. 8. Comparison of our horizontal iterative minimization against the permutation method of Heitz and Belcour [2019] (with retargeting) on 16-frame
sequences of static scenes rendered at 4 spp. Our method does a beer job at improving the error distribution frame-to-frame.
Additional comparisons. Figure 9shows additional results from
our horizontal and vertical minimization and error diusion. We
compare these to the permutation method of Heitz and Belcour
[2019] which we found to perform better than their histogram ap-
proach on static scenes at equal sampling rates. For the horizontal
methods we show the results after 16 iterations. Our methods again
yield lower error, subjectively and numerically (see Tables 1and 2).
8 DISCUSSION
8.1 Bias towards surrogate
While ultimately we want to optimize w.r.t. the ground-truth image,
in practice we have to rely on a surrogate. In our experiments we
use reasonably high-quality surrogates, shown in Fig. 12, to best
demonstrate the capabilities of our framework. But when using
a surrogate of low quality, tting too closely to it can produce an
estimate with artifacts. In such cases less aggressive tting may yield
lower perceptual error. To explore the trade-o, in Appendix Bwe
augment the perceptual error with a term that penalizes deviations
from the initial estimate
𝑄
𝑄
𝑄init
(which case of vertical optimization
is obtained by averaging the input per-pixel estimates):
𝐸𝒞=(1𝒞)𝑔
𝑔
𝑔2
1𝑄
𝑄
𝑄𝑄
𝑄
𝑄init2
2+𝒞𝐸. (16)
The parameter
𝒞[
0
,
1
]
encodes our condence in the surrogate
quality. Setting
𝒞=
1reverts to the original formulation
(15)
, while
optimizing with
𝒞=
0yields the initial image estimate
𝑄
𝑄
𝑄init
. Opti-
mizing w.r.t. this energy can also be interpreted as projecting the
surrogate onto the space of Monte Carlo estimates in
Ω
Ω
Ω
, with control
over the tting power of the projection via 𝒞.
In Fig. 10, we plug the extended error formulation
(16)
into our
vertical iterative minimization. The results indicate that the visually
best result is achieved for dierent values of
𝒞
depending on the
surrogate quality. Specically, when optimizing w.r.t. the ground
truth, the tting should be most aggressive:
𝒞=
1. Conversely, if
the surrogate contains structural artifacts, the optimization should
be made less biased to it, e.g., by setting
𝒞=
0
.
5. Other ways to
control this bias are using a more restricted search space (e.g., non-
power-set) and capping the number of minimization iterations of
our methods. Note that the methods of Heitz and Belcour [2019]
rely on implicit surrogates and energies and thus provide no control
over this trade-o. We have found that their permutation method
generally avoids tiling artifacts induced by their piecewise constant
surrogate due to the retargeting step blurring the prediction image
(shown in Fig. 8zoom-ins); however, this blurring adds mispredic-
tions which deteriorate the nal image quality. Our methods provide
better ts, target the error explicitly, and are much superior when
the surrogate is good. With a bad surrogate, ours can be controlled
to never do worse than theirs.
8.2 Denoising
Our images are optimized for eliminating error and preserving
features when blurred with a given kernel. This blurring can be
seen as a simple form of denoising, and it is reasonable to expect
that the images are also better suited for general-purpose denoising
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:13
Horizontal: Iterative (ours)Horizontal: Iterative (ours) Vertical: Error di. (ours)Vertical: Error di. (ours)
Vertical: Iterative (ours)Vertical: Iterative (ours)Horizontal: PermutationHorizontal: Permutation
4-spp average4-spp average
Horizontal: Iterative (ours)Horizontal: Iterative (ours) Vertical: Error di. (ours)Vertical: Error di. (ours)
Vertical: Iterative (ours)Vertical: Iterative (ours)Horizontal: PermutationHorizontal: Permutation
4-spp average4-spp average
Horizontal: Iterative (ours)Horizontal: Iterative (ours) Vertical: Error di. (ours)Vertical: Error di. (ours)
Vertical: Iterative (ours)Vertical: Iterative (ours)Horizontal: PermutationHorizontal: Permutation
4-spp average4-spp average
Fig. 9. Comparison of our methods against the permutation approach of Heitz and Belcour [2019] at 4 spp; for the horizontal methods we show the result of
the 16
th
frame of static-scene rendering. Our two iterative minimization algorithms yield the best quality, while error diusion is fastest (see Tables 1and 2).
Surrogate: Ground truth Surr.: Denoised per-pixel avg. Surr.: Tile-wise sample avg.
Surrogate image
𝒞=1
𝒞=0.75𝒞=0.5
𝒞=0
Fig. 10. Balancing our iterative optimization between the surrogate (top
row) and the initial estimate (boom row) via the parameter
𝒞
from Eq. (16).
For high-quality surrogates (le and middle columns), the best result is
achieved for values of
𝒞
close to 1. In contrast, strong structural artifacts
(right column) call for lowering
𝒞
to avoid fiing too closely to the surrogate.
The (subjectively) best image in each column is outlined in red.
than traditional white-noise renderings are [Heitz and Belcour 2019;
Belcour and Heitz 2021]. However, we have found that obtaining
such benet is not straightforward.
In Fig. 11 we run Intel Open Image Denoise on the results from
our vertical iterative minimization. On the left scene, the input
samples
have white-noise image distribution with large mag-
nitude; feeding their per-pixel averages to the denoiser, it cannot
reliably separate the signal from the noise and produces conspicu-
ous artifacts. Using this denoised image
as a surrogate for our
optimization yields a “regularized” version
of the input that is
easier for the denoiser to subsequently lter
. This process can be
seen as projecting the initial denoised image back onto the space of
exact per-pixel estimates (while minimizing the pMSE) whose subse-
quent denoising avoids artifacts. Note that obtaining this improved
result requires no additional pixel sampling.
On the right scene in Fig. 11, the moderate input-noise level is
easy for the denoiser to clean while preserving the faint shadow
on the wall. Our optimization subsequently produces an excellent
result which yields a high-delity image when convolved with the
optimization kernel
𝑔
𝑔
𝑔
. Yet that same result is ruined by the denoiser
which eradicates the shadow, even though subjectively its signal-
to-noise ratio is higher than that of the input image. Overall, the
denoiser blurs our result
aggressively on both scenes, eliminating
not only the high-frequency noise but also lower-frequency signal
not present in auxiliary input feature buers (depth, normals, etc).
It should not be too surprising that an image optimized for one
smoothing kernel does not always yield good results when ltered
with other kernels. As an example, Fig. 5shows clearly that the op-
timal noise distribution varies signicantly across dierent kernels.
While our kernel
𝑔
𝑔
𝑔
has narrow support and xed shape, denoising
kernels vary wildly over the image and are inferred from the input
in order to preserve features. Importantly, modern kernel-inference
models (like in the used denoiser) are designed (or trained) to expect
mutually uncorrelated pixel estimates [Intel 2018]. This white-noise-
error assumption can also yield wide smoothing kernels that are
unnecessarily aggressive for blue-noise distributions; we suspect
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:14 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
1
2
3
4
4-spp input4-spp input
Input denoisedInput denoised
OursOurs
Ours denoisedOurs denoised
Ours 𝑔
𝑔
𝑔-convolvedOurs 𝑔
𝑔
𝑔-convolved
1
2
3
4
4-spp input4-spp input
Input denoisedInput denoised
OursOurs
Ours denoisedOurs denoised
Ours 𝑔
𝑔
𝑔-convolvedOurs 𝑔
𝑔
𝑔-convolved
Fig. 11. By regularizing a noisy input, our optimization can help a denoiser
avoid producing artifacts (le scene), even though it targets a dierent
(perceptual) smoothing kernel
𝑔
𝑔
𝑔
. However, it can also cause elimination of
image features during denoising (right scene, the shadow).
this is what hinders the denoiser from detecting features present in
our optimized results whose pixels are highly correlated.
Our rm belief is that denoising could consistently benet from
error optimization, though that would require better coordination
between the two. One avenue for future work would be to tailor
the optimization to the kernels employed by a target denoiser. Con-
versely, denoising could be adapted to ingest correlated pixel esti-
mates with high-frequency error distribution; this would enable the
use of less aggressive smoothing kernels (see Fig. 3) and facilitate
feature preservation. As a more immediate treatment, image fea-
tures could be enhanced before or after our optimization to mitigate
the risk of them being eliminated by denoising.
8.3 Performance and utility
Throughout our experiments, we have found that the tested algo-
rithms rank in the following order in terms of increasing ability to
minimize perceptual error on static scenes at equal sampling cost:
histogram sampling, our dithering, permutation, our error diusion,
our horizontal iterative, our vertical iterative. The three lowest-
ranked methods employ some form of dithering which by design
assumes (a) constant image signal and (b) equi-spaced quantization
levels shared by all pixels. The latter assumption is severely broken
in the rendering setting where the “quantization levels” arise from
(random) pixel estimation. Our vertical methods (dithering, error
diusion, iterative) are more practical than the histogram sampling
of Heitz and Belcour [2019] as they can achieve high delity with a
much lower input-sample count. Horizontal algorithms are harder
to control due to their mispredictions which are further exacerbated
when reusing estimates across frames in dynamic scenes.
Our iterative minimizations can best adapt to the input and also
directly benet from the extensions in Section 6(unlike all others).
However, they are also the slowest, as evident in Table 2. Fortu-
nately, they can be sped up by several orders of magnitude through
additional optimizations from halftoning literature [Analoui and
Modern living roomModern living room Grey & white roomGrey & white room San MiguelSan Miguel
Wooden staircaseWooden staircase Japanese classroomJapanese classroom White roomWhite room
BathroomBathroom Modern hallModern hall
Fig. 12. Collage of the surrogates used in our experiments, obtained by
denoising the input estimates using Intel Open Image Denoise [Intel 2018].
Allebach 1992;Koge et al
.
2014]; we discuss these optimizations in
the context of our rendering setting in supplemental Section 3.
Error diusion is often on par with vertical iterative minimization
in quality and with dithering-based methods in run time. In a single-
threaded implementation it can outperform all others; parallel error-
diusion variants exist too [Metaxas 2003].
Practical utility. Our methods can enhance the perceptual delity
of static and dynamic renderings as demonstrated by our experi-
ments. For best results and maximum exibility, we suggest using
our vertical iterative optimization, optionally with the eciency
improvements mentioned above. Figure 10 illustrates that in practi-
cal scenarios (middle and right columns) this method can improve
upon both the surrogate (top row) and the input-estimate average
(bottom row) for a suitable value of the condence parameter
𝒞
.
For maximum eciency we recommend using our vertical error
diusion. To obtain a surrogate, we recommend regularizing the
input estimates via fast denoising or more basic bilateral or non-
local-means ltering. Our optimization can then be interpreted as
reducing bias or artifacts in such denoised images (see Fig. 10). Sim-
ple denoising of the result may yield better quality than traditional
aggressive denoising of the input samples.
Progressive rendering. Our optimization methods produce biased
pixel estimates through manipulating the input samples; this is true
even for a-priori methods where the sampling is completely de-
terministic. Nevertheless, consistency can be achieved through a
simple progressive-rendering scheme: For each pixel, newly gener-
ated samples are cumulatively averaged into a xed set of per-pixel
estimates that are periodically passed to the optimization to obtain
an updated image. Each individual estimate will converge to the
true pixel value, thus the optimized image will also approach the
ground truth—with bounded memory footprint. Interestingly, con-
vergence is guaranteed regardless of the optimization method and
surrogate used, though better methods and surrogates will yield bet-
ter starting points. Lastly, adaptive sampling is naturally supported
by vertical methods as they are agnostic of dierences in sample
counts between pixels.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:15
Table 1. MSE and pMSE (Section 7.1) metrics for various methods (ours in bold) and scenes. For horizontal methods we show the metrics for the 16
th
frame
of static-scene rendering. In each section we highlight the lowest error number per column. For the same number of samples per pixel (spp), our methods
consistently outperform those of Heitz and Belcour [2019]—the current state of the art, except our dithering can do worse than their permutation method.
Method Bathroom Classroom Gray Room Living Room Modern Hall San Miguel Staircase White Room
MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE
×102×103×102×103×102×102×102×103×102×102×102×103×103×103×102×103
Random (4-spp average) 1.40 3.15 3.13 7.91 7.91 3.02 3.37 5.61 5.22 1.70 3.58 8.92 8.88 5.60 2.78 7.98
Vertical: Histogram [2019] (1
/4spp) 3.58 6.29 7.11 13.08 11.49 6.67 5.75 9.88 11.43 3.60 6.84 16.52 18.90 6.69 5.75 14.09
Vertical: Error diusion (1
/4spp) 1.22 2.27 4.91 7.03 8.76 2.82 2.08 2.31 4.86 1.33 5.07 8.50 6.87 5.08 2.19 5.16
Vertical: Dithering (1
/4spp) 1.31 3.31 4.36 11.63 8.46 5.07 2.27 4.43 5.25 1.80 3.74 11.19 7.80 5.36 2.51 7.95
Vertical: Iterative (1
/4spp) 2.32 2.02 6.00 6.10 9.07 2.97 4.32 1.86 7.15 1.29 5.51 7.05 10.50 4.45 3.98 5.00
Vertical: Iterative (power set, 1
/15 “spp”) 1.26 1.66 3.12 4.91 7.53 2.82 2.46 1.13 4.55 1.18 3.31 5.85 7.08 4.31 2.26 4.58
Horizontal: Permut. [2019] (frame 16, 4 spp)
1.40 2.79 3.15 7.25 7.90 2.84 3.38 3.14 5.21 1.51 3.59 8.51 8.87 5.40 2.72 6.73
Horizontal: Iterative (frame 16, 4 spp) 1.52 2.06 3.83 5.31 8.34 2.41 3.59 1.59 5.46 1.18 3.94 7.31 7.67 4.30 2.93 4.72
Random (16-spp average) 0.49 1.47 1.55 4.89 3.77 1.04 1.23 2.18 2.14 0.80 1.10 4.67 3.39 3.78 1.35 3.62
Vertical: Histogram [2019] (4
/16 spp) 1.40 2.37 3.12 6.20 7.88 2.72 3.36 3.57 5.23 1.48 3.52 6.82 7.13 4.09 2.77 5.77
Vertical: Error diusion (4
/16 spp) 0.41 1.20 0.94 3.85 4.00 0.87 0.86 1.07 1.68 0.66 1.33 4.70 2.76 3.69 0.73 2.13
Vertical: Dithering (4
/16 spp) 0.50 1.52 1.15 4.69 4.12 1.36 1.09 1.82 1.93 0.83 1.49 5.38 3.09 3.73 0.91 2.98
Vertical: Iterative (4
/16 spp) 0.90 1.10 2.03 3.35 5.17 0.84 2.30 0.84 3.03 0.64 2.39 4.02 4.46 3.14 1.75 1.99
Table 2. Optimization run times (in seconds) for various methods (ours in bold) and scenes using 4 input samples per pixel (spp), excluding sampling and
surrogate construction. For horizontal methods we report the average time over 16 frames. Our error diusion and dithering avoid sorting and are fastest;
though dithering-based, Heitz and Belcour’s approaches use sorting. Our iterative minimization methods are slowest (but can be sped up; see Section 8.3).
Method Bathroom Classroom Gray Room Living Room Modern Hall San Miguel Staircase White Room
Vertical: Histogram [2019] (1
/4spp) 0.06 0.07 0.11 0.06 0.02 0.09 0.08 0.06
Vertical: Error diusion (1
/4spp) 0.04 0.03 0.04 0.04 0.01 0.06 0.04 0.04
Vertical: Dithering (1
/4spp) 0.04 0.03 0.04 0.04 0.01 0.05 0.04 0.04
Vertical: Iterative (1
/4spp) 18.44 111.41 12.82 15.26 5.43 29.09 15.21 19.45
Vertical: Iterative (power set, 1
/15 “spp”) 95.09 404.12 59.69 83.41 23.93 137.89 35.39 102.05
Horizontal: Permutation [2019] (frame 16) 0.10 0.10 0.10 0.11 0.03 0.21 0.10 0.14
Horizontal: Iterative (frame 16) 23.04 21.57 22.00 30.08 8.48 36.36 23.78 22.76
9 CONCLUSION
We devise a formal treatment of image-space error distribution
in Monte Carlo rendering from both quantitative and perceptual
aspects. Our formulation bridges the gap between halftoning and
rendering by interpreting the error distribution problem as an ex-
tension of non-uniform multi-tone energy minimization halftoning.
To guide the distribution of rendering error, we employ a percep-
tual kernel-based model whose practical optimization can deliver
improvements not achievable by prior methods given the same
sampling data. Our model provides valuable insights as well as a
framework to further study the problem and its solutions.
A promising avenue for future research is to adapt even stronger
perceptual error models. Prior work has already demonstrated a
strong potential in reducing Monte Carlo noise visibility error using
visual masking [Bolin and Meyer 1998;Ramasubramanian et al
.
1999]. Robust metrics, other than squared
2
norm, can also be
considered with possible nonlinear relationships.
Our framework could conceivably be extended beyond the hu-
man visual system, i.e., for optimizing the inputs to other types
of image processing such as denoising. For such tasks, one could
consider lifting the assumption of a xed kernel to obtain an even
more general problem where the kernel and sample distribution are
optimized simultaneously (or alternatingly).
ACKNOWLEDGMENTS
Our results show scenes (summarized in Fig. 12) coming from third
parties. We acknowledge the PBRT scene repository for San Miguel
and Bathroom.Wooden staircase,Modern hall,Modern living room,
Japanese classroom,White room,Grey & white room, and Utah teapot
have been provided by Benedikt Bitterli. The rst author is funded
from the European Research Council (ERC) under the European
Union’s Horizon 2020 research and innovation program (grant agree-
ment 741215, ERC Advanced Grant INCOVID).
REFERENCES
Abdalla G. M. Ahmed and Peter Wonka. 2020. Screen-Space Blue-Noise Diusion of
Monte Carlo Sampling Error via Hierarchical Ordering of Pixels. ACM Trans. Graph.
(Proc. SIGGRAPH Asia) 39, 6, Article 244 (2020). https://doi.org/10.1145/3414685.
3417881
Jan P. Allebach and B. Liu. 1976. Random quasi-periodic halftone process. Journal of
the Optical Society of America 66, 9 (Sep 1976), 909–917. https://doi.org/10.1364/
JOSA.66.000909
Mostafa Analoui and Jan P. Allebach. 1992. Model-based halftoning using direct
binary search. In Human Vision, Visual Processing, and Digital Display III, Bernice E.
Rogowitz (Ed.), Vol. 1666. International Society for Optics and Photonics, SPIE, 96 –
108. https://doi.org/10.1117/12.135959
Dimitris Anastassiou. 1989. Error diusion coding for A/D conversion. IEEE Transactions
on Circuits and Systems 36, 9 (1989), 1175–1186. https://doi.org/10.1109/31.34663
Walter Arrighetti. 2017. The Academy Color Encoding System (ACES): A Professional
Color-Management Framework for Production, Post-Production and Archival of
Still and Motion Pictures. Journal of Imaging 3 (09 2017), 40. https://doi.org/10.
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:16 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
3390/jimaging3040040
Peter G.J. Barten. 1999. Contrast sensitivity of the human eye and its eects on image
quality. SPIE – The International Society for Optical Engineering. https://doi.org/
10.1117/3.353254
Barbara E. Bayer. 1973. An optimum method for two-level rendition of continuous-
tone pictures. In Proceedings of IEEE International Conference on Communications,
Conference Record, Vol. 26. 11–15.
Laurent Belcour and Eric Heitz. 2021. Lessons Learned and Improvements When Build-
ing Screen-Space Samplers with Blue-Noise Error Distribution. In ACM SIGGRAPH
2021 Talks (Virtual Event, USA) (SIGGRAPH ’21). Association for Computing Machin-
ery, New York, NY, USA, Article 9, 2 pages. https://doi.org/10.1145/3450623.3464645
Mark R. Bolin and Gary W. Meyer. 1995. A Frequency Based Ray Tracer. In Proceedings
of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (SIG-
GRAPH ’95). Association for Computing Machinery, New York, NY, USA, 409–418.
https://doi.org/10.1145/218380.218497
Mark R. Bolin and Gary W. Meyer. 1998. A Perceptually Based Adaptive Sampling
Algorithm. In Proceedings of the 25th Annual Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH ’98). Association for Computing Machinery, New
York, NY, USA, 299–309. https://doi.org/10.1145/280814.280924
A. Celarek, W. Jakob, M. Wimmer, and J. Lehtinen. 2019. Quantifying the Error of
Light Transport Algorithms. Computer Graphics Forum 38, 4 (2019), 111–121. https:
//doi.org/10.1111/cgf.13775
Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi,
Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Recon-
struction of Monte Carlo Image Sequences Using a Recurrent Denoising Au-
toencoder. ACM Trans. Graph. 36, 4, Article 98 (jul 2017), 12 pages. https:
//doi.org/10.1145/3072959.3073601
Jianghao Chang, Benoît Alain, and Victor Ostromoukhov. 2009. Structure-Aware Error
Diusion. ACM Trans. Graph. 28, 5 (dec 2009), 1–8. https://doi.org/10.1145/1618452.
1618508
Scott J. Daly. 1987. Subroutine for the Generation of a Two Dimensional Human Visual
Contrast Sensitivity Function. Technical Report 233203Y. Eastman Kodak: Rochester,
NY, USA.
Scott J. Daly. 1992. Visible dierences predictor: an algorithm for the assessment of
image delity. In Human Vision, Visual Processing, and Digital Display III, Bernice E.
Rogowitz (Ed.), Vol. 1666. International Society for Optics and Photonics, SPIE, 2 –
15. https://doi.org/10.1117/12.135952
Robin J. Deeley, Neville Drasdo, and W. Neil Charman. 1991. A simple parametric mo del
of the human ocular modulation transfer function. Ophthalmic and Physiological
Optics 11, 1 (1991), 91–93. https://doi.org/10.1111/j.1475-1313.1991.tb00200.x
James A. Ferwerda, Sumanta N. Pattanaik, Peter Shirley, and Donald P. Greenberg. 1996.
A Model of Visual Adaptation for Realistic Image Synthesis. In Proceedings of the
23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH
’96). Association for Computing Machinery, New York, NY, USA, 249–258. https:
//doi.org/10.1145/237170.237262
Robert W.F loydand Louis Steinb erg. 1976. An Adaptive Algorithm for Spatial Greyscale.
Proceedings of the Society for Information Display 17, 2 (1976), 75–77.
Iliyan Georgiev and Marcos Fajardo. 2016. Blue-Noise Dithered Sampling. In ACM
SIGGRAPH 2016 Talks (Anaheim, California) (SIGGRAPH ’16). Association for Com-
puting Machinery, New York, NY, USA, Article 35, 1 pages. https://doi.org/10.1145/
2897839.2927430
A. J. González, J. Bacca, G. R. Arce, and D. L. Lau. 2006. Alpha stable human visual system
models for digital halftoning. In Human Vision and Electronic Imaging XI, Bernice E.
Rogowitz, Thrasyvoulos N. Pappas, and Scott J. Daly (Eds.), Vol. 6057. International
Society for Optics and Photonics, SPIE, 180 – 191. https://doi.org/10.1117/12.643540
Eric Heitz and Laurent Belcour. 2019. Distributing Monte Carlo Errors as a Blue Noise
in Screen Space by Permuting Pixel Seeds Between Frames. Computer Graphics
Forum 38, 4 (2019), 149–158. https://doi.org/10.1111/cgf.13778
Eric Heitz, Laurent Belcour, V. Ostromoukhov, David Coeurjolly, and Jean-Claude Iehl.
2019. A Low-Discrepancy Sampler That Distributes Monte Carlo Errors as a Blue
Noise in Screen Space. In ACM SIGGRAPH 2019 Talks (Los Angeles, California)
(SIGGRAPH ’19). Association for Computing Machinery, New York, NY, USA, Article
68, 2 pages. https://doi.org/10.1145/3306307.3328191
Edwin Hewitt and Kenneth A. Ross. 1994. Abstract Harmonic Analysis: VolumeI Structure
of Topological Groups Integration Theory Group Representations. Springer New York.
Sam Hocevar and Gary Niger. 2008. Reinstating Floyd-Steinberg: Improved Metrics
for Quality Assessment of Error Diusion Algorithms, Vol. 5099. 38–45. https:
//doi.org/10.1007/978-3- 540-69905- 7_5
Intel. 2018. Intel Open Image Denoise. https://www.openimagedenoise.org.
James T. Kajiya. 1986. The Rendering Equation. SIGGRAPH Comput. Graph. 20, 4 (aug
1986), 143–150. https://doi.org/10.1145/15886.15902
Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkühler, Mikhail Okunev, Todd
Goodall, and Gizem Rufo. 2019. DeepFovea: Neural Reconstruction for Foveated
Rendering and Video Compression Using Learned Statistics of Natural Videos. ACM
Trans. Graph. 38, 6, Article 212 (nov 2019), 13 pages. https://doi.org/10.1145/3355089.
3356557
Hiroaki Koge, Yasuaki Ito, and Koji Nakano. 2014. A GP U Implementation of Clipping-
Free Halftoning Using the Direct Binary Search. In Algorithms and Architectures
for Parallel Processing, Xian-he Sun, Wenyu Qu, Ivan Stojmenovic, Wanlei Zhou,
Zhiyang Li, Hua Guo, Geyong Min, Tingting Yang, Yulei Wu, and Lei Liu (Eds.).
Springer International Publishing, Cham, 57–70. https://doi.org/10.1007/978-3- 319-
11197-1_5
Lauwerens Kuipers and Harald Niederreiter. 1974. Uniform Distribution of Sequences.
Wiley, New York, USA.
Alexandr Kuznetsov, Nima Khademi Kalantari, and Ravi Ramamoorthi. 2018. Deep
Adaptive Sampling for Low Sample Count Rendering. Computer Graphics Forum 37,
4 (2018), 35–44. https://doi.org/10.1111/cgf.13473
Daniel L. Lau and Gonzalo R. Arce. 2007. Modern Digital Halftoning, Second Edition.
CRC Press, Inc., USA.
Tony Lindeberg. 1990. Scale-space for discrete signals. IEEE Transactions on Pattern
Analysis and Machine Intelligence 12, 3 (1990), 234–254. https://doi.org/10.1109/34.
49051
Jerey Lubin. 1995. A Visual Discrimination Model for Imaging System Design and Eval-
uation. In Vision Models for Target Detection and Recognition, Eli Peli (Ed.). World Sci-
entic Publishing Company, Inc., 245–283. https://doi.org/10.1142/9789812831200_
0010
James L. Mannos and David J. Sakrison. 1974. The eects of a visual delity criterion
of the encoding of images. IEEE Transactions on Information Theory 20, 4 (1974),
525–536. https://doi.org/10.1109/TIT.1974.1055250
Rafał Mantiuk, Scott J. Daly, Karol Myszkowski, and Hans-Peter Seidel. 2005. Predicting
visible dierences in high dynamic range images: model and its calibration. In
Human Vision and Electronic Imaging X, Bernice E. Rogowitz, Thrasyvoulos N.
Pappas, and Scott J. Daly (Eds.), Vol. 5666. International Society for Optics and
Photonics, SPIE, 204 – 214. https://doi.org/10.1117/12.586757
Rafał Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-
VDP-2: A Calibrated Visual Metric for Visibility and Quality Predictions in All
Luminance Conditions. ACM Trans. Graph. 30, 4, Article 40 (jul 2011), 14 pages.
https://doi.org/10.1145/2010324.1964935
Panagiotis Takis Metaxas. 2003. Parallel Digital Halftoning by Error-Diusion. In
Proceedings of the Paris C. Kanellakis Memorial Workshop on Principles of Computing
& Knowledge. 35–41. https://doi.org/10.1145/778348.778355
Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing. In
Proc. ACM SIGGRAPH, Vol. 25. 157–164. https://doi.org/10.1145/127719.122736
Theophano Mitsa and Kevin J. Parker. 1991. Digital halftoning using a blue noise mask.
In Proceedings of International Conference on Acoustics, Speech, and Signal Processing.
2809–2812 vol.4. https://doi.org/10.1109/ICASSP.1991.150986
Kathy T. Mullen. 1985. The contrast sensitivity of human colour vision to red-green
and blue-yellow chromatic gratings. Journal of Physiology 359 (1985), 381–400.
https://doi.org/10.1113/jphysiol.1985.sp015591
Karol Myszkowski. 1998. The Visible Dierences Predictor: applications to global
illumination problems. In Rendering Techniques ’98, George Drettakis and Nelson
Max (Eds.). Springer Vienna, Vienna, 223–236. https://doi.org/10.1007/978- 3-7091-
6453-2_21
Risto Näsänen. 1984. Visibility of halftone dot textures. IEEE Transactions on Systems,
Man, and Cybernetics SMC-14, 6 (1984), 920–924. https://doi.org/10.1109/TSMC.
1984.6313320
Harald Niederreiter. 1992. Random Number Generation and quasi-Monte Carlo Methods.
Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
Victor Ostromoukhov. 2001. A Simple and Ecient Error-Diusion Algorithm (SIG-
GRAPH ’01). Association for Computing Machinery, New York, NY, USA, 567–572.
https://doi.org/10.1145/383259.383326
Wai-Man Pang, Yingge Qu, Tien-Tsin Wong, Daniel Cohen-Or, and Pheng-Ann Heng.
2008. Structure-Aware Halftoning. ACM Trans. Graph. 27, 3 (aug 2008), 1–8. https:
//doi.org/10.1145/1360612.1360688
Thrasyvoulos N. Pappas and David L. Neuho. 1999. Least-squares model-based
halftoning. IEEE Transactions on Image Processing 8, 8 (Aug 1999), 1102–1116.
https://doi.org/10.1109/83.777090
Eli Peli, Jian Yang, and Robert B. Goldstein. 1991. Image invariance with changes in size:
the role of peripheral contrast thresholds. Journal of the Optical Society of America
8, 11 (Nov 1991), 1762–1774. https://doi.org/10.1364/JOSAA.8.001762
Matt Pharr, Wenzel Jakob, and Greg Humphreys. 2016. Physically Based Rendering:
From Theory To Implementation (3rd ed.). Morgan Kaufmann Publishers Inc.
Mahesh Ramasubramanian, Sumanta N. Pattanaik, and Donald P. Greenberg. 1999.
A Perceptually Based Physical Error Metric for Realistic Image Synthesis. In Pro-
ceedings of the 26th Annual Conference on Computer Graphics and Interactive Tech-
niques (SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 73–82.
https://doi.org/10.1145/311535.311543
Ansari Rashid, Guillemot Christine, and Memon Nasir. 2005. 5.5 - Lossy Image Com-
pression: JPEG and JPEG2000 Standards. In Handbook of Image and Video Processing
(Second Edition) (second edition ed.), AL BOVIK (Ed.). Academic Press, Burlington,
709–XXII. https://doi.org/10.1016/B978-012119792- 6/50105-4
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:17
J.G. Robson and Norma Graham. 1981. Probability summation and regional variation
in contrast sensitivity across the visual eld. Vision Research 21, 3 (1981), 409–418.
https://doi.org/10.1016/0042-6989(81)90169- 3
Gurprit Singh, Cengiz Öztireli, Abdalla G.M. Ahmed, David Coeurjolly, Kartic Subr,
Oliver Deussen, Victor Ostromoukhov, Ravi Ramamoorthi, and Wojciech Jarosz.
2019. Analysis of Sample Correlations for Monte Carlo Rendering. Computer
Graphics Forum 38, 2 (2019), 473–491. https://doi.org/10.1111/cgf.13653
Givago da Silva Souza, Bruno Duarte Gomes, and Luiz Carlos L. Silveira. 2011. Com-
parative neurophysiology of spatial luminance contrast sensitivity. Psychology &
Neuroscience 4 (06 2011), 29–48. https://doi.org/10.3922/j.psns.2011.1.005
Kevin E. Spaulding, Rodney L. Miller, and Jay S. Schildkraut. 1997. Methods for gener-
ating blue-noise dither matrices for digital halftoning. Journal of Electronic Imaging
6, 2 (1997), 208–230. https://doi.org/10.1117/12.266861
James R. Sullivan, Lawrence A. Ray, and Rodney Miller. 1991. Design of minimum visual
modulation halftone patterns. IEEE Transactions on Systems, Man, and Cybernetics
21, 1 (Jan 1991), 33–38. https://doi.org/10.1109/21.101134
Robert A. Ulichney. 1987. Digital Halftoning. Cambridge, Massachusetts.
Robert A. Ulichney. 1993. Void-and-cluster method for dither array generation. In
Human Vision, Visual Processing, and Digital Display IV, Jan P. Allebach and Bernice E.
Rogowitz (Eds.), Vol. 1913. International Society for Optics and Photonics, SPIE, 332
– 343. https://doi.org/10.1117/12.152707
Valdimir Volevich, Karol Myszkowski, Andrei Khodulev, and Edward A. Kopylov. 2000.
Using the Visual Dierences Predictor to Improve Performance of Progressive
Global Illumination Computation. ACM Trans. Graph. 19, 2 (2000), 122–161. https:
//doi.org/10.1145/343593.343611
G. Westheimer. 1986. The eye as an optical instrument. In Handbook of Perception and
Human Performance: 1. Sensory Processes and Perception, K.R. Bo, L. Kaufman, and
J.P. Thomas (Eds.). Wiley, New York, 4.1–4.20.
Sophie Wuerger, Maliha Ashraf, Minjung Kim, Jasna Martinovic, María Pérez-Ortiz,
and Rafał K. Mantiuk. 2020. Spatio-chromatic contrast sensitivity under mesopic
and photopic light levels. Journal of Vision 20, 4 (04 2020), 23–23. https://doi.org/
10.1167/jov.20.4.23
M. Zwicker, W. Jarosz, J. Lehtinen, B. Moon, R. Ramamoorthi, F. Rousselle, P. Sen, C.
Soler, and S.-E. Yoon. 2015. Recent Advances in Adaptive Sampling and Reconstruc-
tion for Monte Carlo Rendering. Computer Graphics Forum 34, 2 (2015), 667–681.
https://doi.org/10.1111/cgf.12592
A ERROR DECOMPOSITION FOR HORIZONTAL
OPTIMIZATION
Substituting
𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))=𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))+Δ
Δ
Δ
into Eq. (9), we bound the
perceptual error using Jensen’s inequality and the discrete Young
convolution inequality [Hewitt and Ross 1994]:
𝐸(𝜋(𝑆
𝑆
𝑆))=𝑔
𝑔
𝑔(𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼+Δ
Δ
Δ)2
2(17a)
=40.5𝑔
𝑔
𝑔(𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼)+0.5𝑔
𝑔
𝑔Δ
Δ
Δ)2
2(17b)
2𝑔
𝑔
𝑔(𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼)2
2+2𝑔
𝑔
𝑔2
1Δ
Δ
Δ2
2.(17c)
The rst term in Eq. (17c) involves pixel permutations in the readily
available estimated image
𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
. In the second term we make an ap-
proximation that avoids rendering invocations:
Δ
Δ
Δ2
2𝑖𝑑(𝑖, 𝜋(𝑖))
,
where
𝑑(𝑖, 𝑗)
measures the dissimilarity between the light-transport
integrals of pixels
𝑖
and
𝑗
. Dropping the constant 2, we take the
resulting bound as the optimization energy 𝐸𝑑in Eq. (10b).
B SURROGATE CONFIDENCE CONTROL
Here we extend our perceptual error formulation to account for
deviations of the surrogate image
𝐼
𝐼
𝐼
from the ground truth
𝐼
𝐼
𝐼
. We
introduce a parameter
𝒞[
0
,
1
]
that encodes our condence in the
quality of the surrogate and instructs the optimization how closely to
t to it. Given an initial image estimate
𝑄
𝑄
𝑄init
(the per-pixel estimate
average for vertical optimization), we look to optimize for
𝑄
𝑄
𝑄
. We
begin with an articial expansion:
𝐸=𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼2(18a)
=(1𝒞)(𝑔
𝑔
𝑔𝑄
𝑄
𝑄𝑔
𝑔
𝑔𝑄
𝑄
𝑄init)+𝒞(𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼)+
(1𝒞)(𝑔
𝑔
𝑔𝑄
𝑄
𝑄init
𝐼
𝐼
𝐼)+𝒞(
𝐼
𝐼
𝐼
𝐼
𝐼
𝐼)2.(18b)
Using the triangle inequality we then obtain the following bound:
𝐸(1𝒞)(𝑔
𝑔
𝑔𝑄
𝑄
𝑄𝑔
𝑔
𝑔𝑄
𝑄
𝑄init)+𝒞(𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼)2+
(1𝒞)(𝑔
𝑔
𝑔𝑄
𝑄
𝑄init
𝐼
𝐼
𝐼)+𝒞(
𝐼
𝐼
𝐼
𝐼
𝐼
𝐼)2.(19)
The second term on the right-hand side can be dropped as it is
independent of the optimization variable
𝑄
𝑄
𝑄
. We bound the square of
the rst term using Jensen’s and Young’s convolution inequalities:
(1𝒞)(𝑔
𝑔
𝑔𝑄
𝑄
𝑄𝑔
𝑔
𝑔𝑄
𝑄
𝑄init)+𝒞(𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼)2
2(20a)
(1𝒞)𝑔
𝑔
𝑔2
1𝑄
𝑄
𝑄𝑄
𝑄
𝑄init2
2+𝒞𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼2
2.(20b)
We take this bound to be our optimization energy in Eq. (16), noting
that the squared norm in the second term is the original energy
with the surrogate 𝐼
𝐼
𝐼substituted for the ground truth 𝐼
𝐼
𝐼.
If a condence map
𝒞
𝒞
𝒞
is available (e.g., as a byproduct of denois-
ing), the minimization can be done with per-pixel control:
𝐸𝒞
𝒞
𝒞=𝑔
𝑔
𝑔2
11
1
1𝒞
𝒞
𝒞(𝑄
𝑄
𝑄𝑄
𝑄
𝑄init)2
2+𝒞
𝒞
𝒞(𝑔
𝑔
𝑔𝑄
𝑄
𝑄
𝐼
𝐼
𝐼)2
2.(21)
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Supplementary material for:
Perceptual error optimization for Monte Carlo rendering
VASSILLEN CHIZHOV,MIA Group, Saarland University, Max-Planck-Institut für Informatik, Germany
ILIYAN GEORGIEV,Autodesk, United Kingdom
KAROL MYSZKOWSKI,Max-Planck-Institut für Informatik, Germany
GURPRIT SINGH,Max-Planck-Institut für Informatik, Germany
In this supplemental document we discuss various details related to our
general formulation from the main paper. We start with a description of the
extension of our framework to the a-priori setting (Section 1). In Section 2
we describe a way in which textures can be accounted for in our horizontal
approach, so that mispredictions due to multiplicative (and additive) factors
are eliminated. In Section 3we describe ways in which the runtime of itera-
tive energy minimization methods can be improved considerably. Notably,
an expression is derived allowing the energy dierence due to trial swaps
to be evaluated in constant time (no scaling with image size or kernel size).
In the remaining sections we analyze how current a-posteriori [Heitz and
Belcour 2019] (Section 5) and a-priori [Georgiev and Fajardo 2016;Heitz et al
.
2019] (Section 6) state of the art approaches can be related to our framework.
Interpretations are discussed, major sources of error are identied, and the
assumptions of the algorithms are made explicit.
1 A-PRIORI OPTIMIZATION
We extend our theory to the a-priori setting and discuss the main
factors aecting the quality. The quality of a-priori approaches is
determined mainly by three factors: the energy, the search space,
and the optimization strategy. We discuss each of those briey in
the following paragraphs.
Our energy. We extend the a-posteriori energy from the main
paper in order to handle multiple estimators involving dierent
integrands: 𝑄
𝑄
𝑄1, . . . , 𝑄
𝑄
𝑄𝑇, with associated weights 𝑤1, . . . , 𝑤𝑇:
𝐸(𝑆
𝑆
𝑆)=𝑇
𝑡=1
𝑤𝑡𝑔
𝑔
𝑔𝑄
𝑄
𝑄𝑡(𝑆
𝑆
𝑆)𝐼
𝐼
𝐼𝑡2.(1)
In the above
𝑔
𝑔
𝑔
would typically be a low-pass kernel (e.g., Gauss-
ian), and
𝐼
𝐼
𝐼𝑡
is the integral of the function used in the estimator
𝑄
𝑄
𝑄𝑡
.
Through this energy a whole set of functions can be optimized for,
in order for the sequence to be more robust to dierent scenes and
estimators, that do not t any of the considered integrands exactly.
We note that the derived optimization in Section 3below is also
applicable to the minimization of the proposed energy.
Search space. The search space plays an important role for the
qualities which the optimized sequences exhibit. A more restricted
search space provides more robustness and may help avoid over-
tting to the considered set of integrands.
For instance, sample sets may be generated randomly within each
pixel. Then, their assignment to pixels may be optimized over the
space of all possible permutations. This is the setting of horizontal
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
This is the author’s version of the work. It is posted here for your personal use. Not for
redistribution. The denitive Version of Record was published in ACM Transactions on
Graphics,https://doi.org/10.1145/3504002.
methods. If additionally this assignment is done within each dimen-
sion separately it allows for an even better t to the integrands in
the energy (but may degrade the general integration properties of
the sequence). The scrambling keys’ search space in [Heitz et al
.
2019] is a special case of the latter applied for the Sobol sequence.
Constraining the search space to points generated from low-
discrepancy sequences provides further robustness and guarantees
desirable integration properties of the considered sequences. Simi-
larly to [Heitz et al
.
2019], we can consider a search space of Sobol
scrambling keys in order for the optimized sequence to have a low
discrepancy.
Ideally, such integration properties should arise directly from
the energy. However, in practice the scene integrand cannot be
expected to exactly match the set of considered integrands, thus
extra robustness is gained through the restriction. Additionally,
optimizing for many dimensions at the same time is costly as noted
in [Heitz et al
.
2019], thus imposing low-discrepancy properties also
helps in that regard.
Finally, by imposing strict search space constraints a severe re-
striction on the distribution of the error is imposed. This can be
alleviated by imposing the restrictions through soft penalty terms
in the energy. This can allow for a trade-o between blue noise
distribution and integration quality for example.
Progressive rendering. In order to make the sequence applicable to
progressive rendering, subsets of samples should be considered in
the optimization. Given a sample set
𝑆𝑖
for pixel
𝑖
we can decompose
it in sample sets of 1
, . . . , 𝑁
samples:
𝑆𝑖,1... 𝑆𝑖,𝑁 𝑆𝑖
. We denote
the respective images of sample sets 𝑆
𝑆
𝑆1, . . . , 𝑆
𝑆
𝑆𝑁.
Then an energy that also optimizes for the distribution of the
error at each sample count is:
𝐸(𝑆
𝑆
𝑆)=𝑇
𝑡=1
𝑁
𝑘=1
𝑤𝑡,𝑘 𝑔
𝑔
𝑔𝑄
𝑄
𝑄𝑡(𝑆
𝑆
𝑆𝑘)𝐼
𝐼
𝐼𝑡2.(2)
If
𝑤𝑖,𝑘
are set to zero for
𝑘<𝑁
then the original formulation is
recovered. The more general formulation imposes additional con-
straints on the samples, thus the quality at the full sample count
may be compromised if we also require a good quality at lower
sample counts.
Choosing samples from
𝑆𝑖
for
𝑆𝑖,1, . . . , 𝑆𝑖 ,𝑁 1
(in each dimension)
constitutes a vertical search space analogous to the one discussed
in the main paper for a-posteriori methods. The ranking keys’ opti-
mization in [Heitz et al
.
2019] is a special case of this search space
for the Sobol sequence.
arXiv:2012.02344v6 [cs.GR] 5 Apr 2022
26:2 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
Adaptive sampling can be handled by allowing a varying number
of samples per pixel, and a corresponding energy derived from the
one above. Note that this poses further restrictions on the achievable
distribution.
Optimization strategies. Typically the energies for a-priori meth-
ods have been optimized through simulated annealing [Georgiev
and Fajardo 2016;Heitz et al
.
2019]. Metaheuristics can lead to very
good minima especially if the runtime is not of great concern, which
is the case since the sequences are precomputed. Nevertheless, the
computation still needs to be tractable. The energies in previous
works are generally not cheap to evaluate. On the other hand, our
energies, especially if the optimizations in Section 3are considered,
can be evaluated very eciently. This is benecial for keeping the
runtime of metaheuristics manageable, allowing for more complex
search spaces to be considered.
Implementation details. Implementation decisions for a renderer,
such as how samples are consumed, or how those are mapped to
the hemisphere and light sources, aect the estimator
𝑄
𝑄
𝑄
. This is
important, especially when choosing
𝑄
𝑄
𝑄
for the described energies to
optimize a sequence. It is possible that very small implementation
changes make a previously ideal sequence useless for a specic
renderer. It is important to keep this in mind when optimizing
sequences by using the proposed energies and when those are used
in a renderer.
2 TEXTURE DEMODULATION FOR HORIZONTAL
OPTIMIZATION
Our iterative energy minimization algorithms (Alg. 1, Alg. 2, main
paper) directly work with the original energy formulation, unlike
error diusion and dither matrix halftoning which only approxi-
mately minimize the energy. This allows textures to be handled
more robustly compared to the permutation approach of Heitz and
Belcour.
Reducing misprediction errors. Our horizontal approach relies on
a dissimilarity metric
𝑑(,)
which approximates terms involving
the dierence
Δ
Δ
Δ
due to swapping sample sets instead of pixels. This
dierence can be decreased, so that
𝑑
is a better approximation, if
additional information is factored out in the energy: screen-space
varying multiplicative and additive terms. Specically, if we have
a spatially varying multiplicative image
𝛼
𝛼
𝛼
, and a spatially varying
additive image 𝛽
𝛽
𝛽:
𝑄
𝑄
𝑄=𝛼
𝛼
𝛼𝑄
𝑄
𝑄+𝛽
𝛽
𝛽(3)
Δ
Δ
Δ(𝜋)=𝛼
𝛼
𝛼𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))𝛼
𝛼
𝛼𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)) (4)
Δ
Δ
Δ(𝜋)=𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))=
𝛼
𝛼
𝛼𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))+𝛽
𝛽
𝛽𝜋(𝛼
𝛼
𝛼𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)+𝛽
𝛽
𝛽),(5)
we can make use of this in our formulation:
𝐸(𝜋)=𝑔
𝑔
𝑔𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))
𝐼
𝐼
𝐼2
2(6)
𝐸(𝜋)𝑔
𝑔
𝑔(𝛼
𝛼
𝛼𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))+𝛽
𝛽
𝛽)
𝐼
𝐼
𝐼2+𝑔
𝑔
𝑔1Δ
Δ
Δ2.(7)
Contrast this to the original formulation where
𝛼
𝛼
𝛼
and
𝛽
𝛽
𝛽
are not
factored out:
𝐸(𝜋)𝑔
𝑔
𝑔𝜋𝛼
𝛼
𝛼𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)+𝛽
𝛽
𝛽
𝐼
𝐼
𝐼2+𝑔
𝑔
𝑔1Δ
Δ
Δ2.(8)
With the new formulation it is sucient that
𝑄
𝑄
𝑄(𝜋(𝑆
𝑆
𝑆))=𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))
for
Δ
Δ
Δ
to be zero, while originally both
𝛼
𝛼
𝛼
and
𝛽
𝛽
𝛽
play a role in
Δ
Δ
Δ
becoming zero. Intuitively this means that screen space integrand
dierences due to additive and multiplicative factors do not result
in mispredictions with the new formulation, if the integrand is
assumed to be the same (locally) in screen space.
Comparison to demodulation. In the method of Heitz and Belcour
the permutation is applied on the albedo demodulated image. This
preserves the property that the global minimum of the implicit
energy can be found through sorting. Translated to our framework
this can be formulated as (
𝐵
𝐵
𝐵
is a blue noise mask optimized for a
kernel 𝑔
𝑔
𝑔):
𝐸𝐻 𝐵𝑃 (𝜋)=𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼𝐵
𝐵
𝐵2
2𝑔
𝑔
𝑔𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝑔
𝑔
𝑔𝐼
𝐼
𝐼2
2.(9)
We have assumed that
𝛽
𝛽
𝛽
is zero, but we can also extend the method
to handle an additive term
𝛽
𝛽
𝛽
as in our case. The more important
distinction is that while the albedo demodulated image
𝑄
𝑄
𝑄
is used
in the permutation, it is never re-modulated (
𝛼
𝛼
𝛼
is missing). Thus,
this does not allow for proper handling of textures, even if it allows
for modest improvements in practice. An example of a fail case
consists of an image
𝛼
𝛼
𝛼
that is close to white noise. Then the error
distribution will also be close to white noise due to the missing
𝛼
𝛼
𝛼
factor. More precisely, even if
𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼
is distributed as
𝐵
𝐵
𝐵
, this
does not imply that
𝛼
𝛼
𝛼𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))𝐼
𝐼
𝐼
will be distributed similarly.
Dropping
𝛼
𝛼
𝛼
is, however, a reasonable option if one is restricted
to sorting as an optimization strategy.
We propose a modication of the original approach (and energy)
such that not only the demodulated estimator values are used, but
the blue noise mask
𝐵
𝐵
𝐵
is also demodulated. To better understand
how it is derived (and how
𝛽
𝛽
𝛽
may be integrated) we study a bound
based on the assumption that 𝛼𝑖[0,1], and Δ
Δ
Δ=0
𝐸(𝜋)=𝑔
𝑔
𝑔(𝛼
𝛼
𝛼𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))+𝛽
𝛽
𝛽)𝑔
𝑔
𝑔𝐼
𝐼
𝐼2
2(10)
𝛼
𝛼
𝛼𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))+𝛽
𝛽
𝛽𝐼
𝐼
𝐼𝐵
𝐵
𝐵2
2=(11)
𝑖
𝛼2
𝑖(𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)))𝑖+𝛽𝑖𝐼
𝑖𝐵𝑖
𝛼𝑖2(12)
𝜋(𝑄
𝑄
𝑄(𝑆
𝑆
𝑆))+𝛽
𝛽
𝛽𝐼
𝐼
𝐼𝐵
𝐵
𝐵
𝛼
𝛼
𝛼2
2
.(13)
The global minimum of the last energy (w.r.t.
𝜋
) can be found
through sorting also, since there is no spatially varying multiplica-
tive factor 𝛼
𝛼
𝛼in front of the permutation.
Sinusoidal textures. To demonstrate texture handling (multiplica-
tive term
𝛼
𝛼
𝛼
), in the top row of Fig. 1, a white-noise texture
𝑊
is multi-
plied with a sine-wave input signal:
𝑓(𝑥, 𝑦)=
0
.
5
(1.0+sin(𝑥+𝑦))
𝑊(𝑥, 𝑦)
. The reference is a constant image at 0
.
5.Heitz and Belcour
proposed to handle such textures by applying their method on the
albedo-demodulated image. While this strategy may lead to a modest
improvement, it ignores the fact that the image is produced by re-
modulating the albedo, which can negate that improvement. Instead,
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
Perceptual error optimization for Monte Carlo rendering 26:3
Input Ours Heitz and Belcour [2019]
multiplicative
additive
No Demodulation Demodulation
demodulation w/ tilesize 8 w/o tiling
Fig. 1. We demonstrate the importance of the extension presented in Sec-
tion 2. A high-frequency sinusoidal texture is corrupted by white noise
(lemost column) multiplicatively (
top row
) and additively (
boom row
).
Contrary to Heitz and Belcour’s method, our optimization distributes error
as a high-quality blue-noise distribution (see the power-spectrum insets).
The reference images for the top/boom image are respectively a flat grey
and a sinusoidal image.
our horizontal iterative minimization algorithm can incorporate the
albedo explicitly using the discussed energy.
The bottom row demonstrates the eect of a non-at signal on
the error distribution (additive term
𝛽
𝛽
𝛽
). Here
𝑊
is added to a sine-
wave input signal:
𝑓(𝑥, 𝑦)=
0
.
5
(1.0+sin(𝑥+𝑦))+𝑊(𝑥, 𝑦)
. The
reference image is 0
.
5
(1+sin(𝑥+𝑦))
. Our optimization is closer
to the reference suggesting that our method can greatly outperform
the current state of the art by properly accounting for auxiliary
information, especially in regions with high-frequency textures.
Dimensional decomposition. The additive factor
𝛽
𝛽
𝛽
can be used to
motivate splitting the optimization over several dimensions, since
the Liouville–Neumann expansion of the rendering equation is ad-
ditive [Kajiya 1986]. If some dimensions are smooth (e.g., lower
dimensions), then a screen space local integrand similarity assump-
tion can be encoded in
𝑑(,)
and it will approximate
Δ
Δ
Δ
better for
smoother dimensions. If the optimization is applied over all dimen-
sions at the same time, this may result in many mispredictions due
to the assumption being violated for dimensions in which the in-
tegrand is less smooth in screen space (e.g., higher dimensions).
We propose splitting the optimization problem starting from lower
dimensions and sequentially optimizing higher dimensions while
encoding a local smoothness (in screen space) assumption on the in-
tegrand in
𝑑(,)
(e.g., swaps limited to a small neighborhood around
the pixel). This requires solving several optimization problems, but
potentially reduces the amount of mispredictions. Note that it does
not require more rendering operations than usual.
3 IMPROVING ITERATIVE-OPTIMIZATION
PERFORMANCE
The main cost of iterative minimization methods is computing the
energy for each trial swap, more specically the required convolu-
tion and the subsequent norm computation. In the work of Analoui
and Allebach an optimization has been proposed to eciently eval-
uate such trial swaps, without recomputing a convolution or norm
at each step, yielding a speed up of more than 10 times. The opti-
mization relies on the assumption that the kernel
𝑔
𝑔
𝑔
is the same in
screen space (the above optimization is not applicable for spatially
varying kernels). We extend the described optimization to a more
general case, also including spatially varying kernels. We also note
some details not mentioned in the original paper.
3.1 Horizontal swaps
We will assume the most general case: instead of just swapping
pixels, we consider swapping sample sets from which values are
generated through
𝑄
𝑄
𝑄
. It subsumes both swapping pixel values and
swapping pixel values in the presence of a multiplicative factor 𝛼
𝛼
𝛼.
Single swap. The main goal is to evaluate the change of the energy
𝛿
due to a swap between the sample sets of some pixels
𝑎,𝑏
. More
precisely, if the original sample set image is
𝑆
𝑆
𝑆
then the new sample
set image is
𝑆
𝑆
𝑆
such that
𝑆
𝑎=𝑆𝑏, 𝑆
𝑏=𝑆𝑎
, and
𝑆
𝑖=𝑆𝑖
everywhere
else. This corresponds to images:
𝑄
𝑄
𝑄=𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
and
𝑄
𝑄
𝑄=𝑄
𝑄
𝑄(𝑆
𝑆
𝑆)
. The
two images dier only in the pixels with indices 𝑎and 𝑏. Let:
𝛿𝑎=𝑄
𝑎𝑄𝑎=𝑄𝑎(𝑆𝑏)𝑄𝑎(𝑆𝑎)(14)
𝛿𝑏=𝑄
𝑏𝑄𝑏=𝑄𝑏(𝑆𝑎)𝑄𝑏(𝑆𝑏).(15)
We will also denote the convolved images
˜
𝑄
𝑄
𝑄=𝑔
𝑔
𝑔𝑄
𝑄
𝑄
and
˜
𝑄
𝑄
𝑄=𝑔
𝑔
𝑔𝑄
𝑄
𝑄
,
and also 𝜖
𝜖
𝜖=˜
𝑄
𝑄
𝑄𝐼
𝐼
𝐼. Specically:
˜
𝑄𝑖=
𝑗Z2
𝑄𝑗𝑔𝑖𝑗,˜
𝑄
𝑖=˜
𝑄𝑖+𝛿𝑎𝑔𝑖𝑎+𝛿𝑏𝑔𝑖𝑏.(16)
We want to be able to eciently evaluate
𝛿=˜
𝑄
𝑄
𝑄𝐼
𝐼
𝐼2˜
𝑄
𝑄
𝑄𝐼
𝐼
𝐼2
,
since in the iterative minimization algorithms the candidate with the
minimum
𝛿
is kept. Using the above expressions for
˜
𝑄
𝑖
we rewrite
𝛿as:
𝛿=˜
𝑄
𝑄
𝑄𝐼
𝐼
𝐼2˜
𝑄
𝑄
𝑄𝐼
𝐼
𝐼2=(17)
𝑖Z2˜
𝑄𝑖𝐼𝑖+𝛿𝑎𝑔𝑖𝑎+𝛿𝑏𝑔𝑖𝑏2˜
𝑄
𝑄
𝑄𝐼
𝐼
𝐼2=(18)
2
𝑖Z2˜
𝑄𝑖𝐼𝑖, 𝛿𝑎𝑔𝑖𝑎+𝛿𝑏𝑔𝑖𝑏+
𝑖Z2𝛿𝑎𝑔𝑖𝑎+𝛿𝑏𝑔𝑖𝑏2=(19)
2𝛿𝑎,
𝑖Z2
𝜖𝑖𝑔𝑖𝑎+2𝛿𝑏,
𝑖Z2
𝜖𝑖𝑔𝑖𝑏+
𝛿2
𝑎,
𝑖Z2
𝑔𝑖𝑎𝑔𝑖𝑎+𝛿2
𝑏,
𝑖Z2
𝑔𝑖𝑏𝑔𝑖𝑏+
2𝛿𝑎𝛿𝑏,
𝑖Z2
𝑔𝑖𝑎𝑔𝑖𝑏=
(20)
2𝛿𝑎,𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖(𝑎)+2𝛿𝑏,𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖(𝑏)+
(𝛿2
𝑎+𝛿2
𝑏),𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(0)+2𝛿𝑎𝛿𝑏,𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(𝑏𝑎),(21)
where
𝐶𝑓 ,ℎ(𝑥)=𝑖Z2𝑓(𝑖𝑥)(𝑖)
is the cross-correlation of
𝑓
and
. We have reduced the computation of
𝛿
to the sum of only 4
terms. Assuming that
𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔
is known (it can be precomputed once for
a known kernel) and that
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖
is known (it can be recomputed after
a sucient amount of swaps have been accepted), then evaluating
a trial swap takes constant time (it does not scale in the size of the
image or the size of the kernel).
ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.
26:4 Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh
Multiple accepted swaps. It may be desirable to avoid recomputing
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖
even upon accepting a trial swap. For that purpose we extend
the strategy from [Analoui and Allebach 1992] for computing
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛
,
where 𝜖
𝜖
𝜖𝑛is the error image after 𝑛swaps have been accepted:
{(𝛿𝑎1, 𝛿𝑏1), . . . , (𝛿𝑎𝑛, 𝛿𝑏𝑛)}.(22)
This implies:
˜
𝑄𝑛
𝑖=˜
𝑄+𝑛
𝑘=1(𝛿𝑎𝑘𝑔𝑖𝑎𝑘+𝛿𝑏𝑘𝑔𝑖𝑏𝑘)
, and conse-
quently:
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛(𝑥)=(23)
𝑖Z2
˜
𝑄𝑖𝐼𝑖+𝑛
𝑘=1(𝛿𝑎𝑘𝑔𝑖𝑎𝑘+𝛿𝑏𝑘𝑔𝑖𝑏𝑘)
𝑔𝑖𝑥=(24)
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖(𝑥)+𝑛
𝑘=1(𝛿𝑎𝑘𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(𝑥𝑎𝑘)+𝛿𝑏𝑘𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(𝑥𝑏𝑘)).(25)
This allows avoiding the recomputation of
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖
after every accepted
swap, and instead, the delta on the
𝑛+
1-st swap with trial dierences
𝛿𝑎, 𝛿𝑏is:
𝛿𝑛+1=𝑄
𝑄
𝑄𝑛+1𝐼
𝐼
𝐼2𝑄
𝑄
𝑄𝑛𝐼
𝐼
𝐼2=(26)
2𝛿𝑎,𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛(𝑎)+2𝛿𝑏,𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛(𝑏)+
(𝛿2
𝑎+𝛿2
𝑏),𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(0)+2𝛿𝑎𝛿𝑏,𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(𝑏𝑎),(27)
where
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛
is computed from
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖
and
𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔
as derived in Eq. (17).
This computation scales only in the number of accepted swaps
since the last recomputation of
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖
. We also note that
𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(𝑥𝑦)
evaluates to zero if
𝑥𝑦
is outside of the support of
𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔
. Addi-
tional optimizations have been devised due to this fact [Analoui and
Allebach 1992].
3.2 Vertical swaps
In the vertical setting swaps happen only within the pixel itself,
that is:
𝛿𝑎=𝑄𝑎(𝑆
𝑎)𝑄𝑎(𝑆𝑎)
. Consequently,
˜
𝑄
𝑖=˜
𝑄𝑖+𝛿𝑎𝑔𝑖𝑎
.
Computing the dierence in the energies for the 𝑛+1-st swap:
𝛿𝑛+1=˜
𝑄
𝑄
𝑄𝑛+1𝐼
𝐼
𝐼2˜
𝑄
𝑄
𝑄𝑛𝐼
𝐼
𝐼2=(28)
𝑖Z2˜
𝑄𝑛
𝑖𝐼𝑖+𝛿𝑎𝑔𝑖𝑎2˜
𝑄
𝑄
𝑄𝑛𝐼
𝐼
𝐼2=(29)
2
𝑖Z2˜
𝑄𝑛
𝑖𝐼𝑖, 𝛿𝑎𝑔𝑖𝑎+
𝑖Z2𝛿𝑎𝑔𝑖𝑎2=(30)
2𝛿𝑎,
𝑖Z2
𝜖𝑛
𝑖𝑔𝑖𝑎+𝛿2
𝑎,
𝑖Z2
𝑔𝑖𝑎𝑔𝑖𝑎=(31)
2𝛿𝑎,𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛(𝑎)+𝛿2
𝑎,𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(0).(32)
The corresponding expression for 𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛is:
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖𝑛(𝑥)=𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖(𝑥)+𝑛
𝑘=1
𝛿𝑎𝑘𝐶𝑔
𝑔
𝑔,𝑔
𝑔
𝑔(𝑥𝑎𝑘).(33)
3.3 Multiple simultaneous updates
If the search space is ignored and the formulation is analyzed in
an abstract setting it becomes obvious that the vertical approach
corresponds to an update of a single pixel, while the horizontal
approach corresponds to an update of two pixels at the same time.
This can be generalized further. Let
𝑁
dierent pixels be updated
per trial, and let there be
𝑛
trials that have been accepted since
𝐶𝑔
𝑔
𝑔,𝜖
𝜖
𝜖
has been updated. Let the pixels to be updated in the current trial
be:
𝑎𝑛+1
1, . . . , 𝑎𝑛+1
𝑁
, and the accepted update at step
𝑘
be at pixels:
𝑎𝑘
1, . . . , 𝑎𝑘
𝑁
. Let
𝑄
𝑄
𝑄0=