Content uploaded by Vassillen Chizhov

Author content

All content in this area was uploaded by Vassillen Chizhov on Apr 26, 2022

Content may be subject to copyright.

Perceptual error optimization for Monte Carlo rendering

VASSILLEN CHIZHOV,MIA Group, Saarland University, Max-Planck-Institut für Informatik, Germany

ILIYAN GEORGIEV,Autodesk, United Kingdom

KAROL MYSZKOWSKI,Max-Planck-Institut für Informatik, Germany

GURPRIT SINGH,Max-Planck-Institut für Informatik, Germany

4-spp average4-spp average Ours w.r.t. surrogateOurs w.r.t. surrogate Ours w.r.t. ground truthOurs w.r.t. ground truth

Tiled error power spectrumTiled error power spectrum

4-spp average4-spp average Ours w.r.t. surrogateOurs w.r.t. surrogate Ours w.r.t. ground truthOurs w.r.t. ground truth

Fig. 1. We devise a perceptually based model to optimize the error of Monte Carlo renderings. Here we show our vertical iterative minimization algorithm

from Section 4.1: Given 4 input samples per pixel (spp), it selects a subset of them to produce an image with substantially improved visual fidelity over a simple

4-spp average. The optimization is guided by a surrogate image obtained by regularizing the noisy input; we also show using the ground-truth image as a guide.

The power spectrum of the image error, computed on 32

×

32-pixel tiles, indicates that our method distributes pixel error with locally blue-noise characteristics.

Synthesizing realistic images involves computing high-dimensional light-

transport integrals. In practice, these integrals are numerically estimated

via Monte Carlo integration. The error of this estimation manifests itself

as conspicuous aliasing or noise. To ameliorate such artifacts and improve

image delity, we propose a perception-oriented framework to optimize the

error of Monte Carlo rendering. We leverage models based on human per-

ception from the halftoning literature. The result is an optimization problem

whose solution distributes the error as visually pleasing blue noise in image

space. To nd solutions, we present a set of algorithms that provide varying

trade-os between quality and speed, showing substantial improvements

over prior state of the art. We perform evaluations using quantitative and

error metrics, and provide extensive supplemental material to demonstrate

the perceptual improvements achieved by our methods.

CCS Concepts:

•Computing methodologies →Ray tracing

;Image pro-

cessing.

Additional Key Words and Phrases: Monte Carlo, rendering, sampling, per-

ceptual error, blue noise, halftoning, dithering, error diusion

ACM Reference Format:

Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh.

2022. Perceptual error optimization for Monte Carlo rendering. ACM Trans.

Graph. 41, 3, Article 26 (June 2022), 17 pages. https://doi.org/10.1145/3504002

Authors’ addresses: Vassillen Chizhov, MIA Group, Saarland University, and Max-

Planck-Institut für Informatik, Saarbrücken, Germany; Iliyan Georgiev, Autodesk,

United Kingdom; Karol Myszkowski, Max-Planck-Institut für Informatik, Saarbrücken,

Germany; Gurprit Singh, Max-Planck-Institut für Informatik, Saarbrücken, Germany.

©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.

This is the author’s version of the work. It is posted here for your personal use. Not for

redistribution. The denitive Version of Record was published in ACM Transactions on

Graphics,https://doi.org/10.1145/3504002.

1 INTRODUCTION

Monte Carlo sampling produces approximation error. In rendering,

this error can cause visually displeasing image artifacts, unless con-

trol is exerted over the correlation of the individual pixel estimates.

A standard approach is to decorrelate these estimates by random-

izing the samples independently for every pixel, turning potential

structured artifacts into white noise.

In digital halftoning, the error induced by quantizing continuous-

tone images has been studied extensively. Such studies have shown

that a blue-noise distribution of the quantization error is perceptu-

ally optimal [Ulichney 1987], achieving substantially higher image

delity than a white-noise distribution. Recent works have proposed

empirical means to transfer these ideas to image synthesis [Georgiev

and Fajardo 2016;Heitz and Belcour 2019;Heitz et al

.

2019;Ahmed

and Wonka 2020]. Instead of randomizing the pixel estimates, these

methods introduce negative correlation between neighboring pixels,

exploiting the local smoothness in images to push the estimation

error to the high-frequency spectral range.

We propose a theoretical formulation of perceptual error for im-

age synthesis which unies prior methods in a common framework

and formally justies the desire for blue-noise error distribution. We

extend the comparatively simpler problem of digital halftoning [Lau

and Arce 2007] where the ground-truth image is given, to the sub-

stantially more complex one of rendering where the ground truth is

the sought result and thus unavailable. Our formulation bridges the

gap between multi-tone halftoning and rendering by interpreting

Monte Carlo estimates for a pixel as its admissible ‘quantization

levels’. This insight allows virtually any halftoning method to be

adapted to rendering. We demonstrate this for the three main classes

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

arXiv:2012.02344v6 [cs.GR] 5 Apr 2022

26:2 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

of halftoning algorithms: dither-mask halftoning, error diusion

halftoning, and iterative energy minimization halftoning.

Existing methods [Georgiev and Fajardo 2016;Heitz and Bel-

cour 2019;Heitz et al

.

2019] can be seen as variants of dither-mask

halftoning. They distribute pixel error according to masks that are

optimized w.r.t. a target kernel, typically a Gaussian. The kernel can

be interpreted as an approximation to the human visual system’s

point spread function [Daly 1987;Pappas and Neuho 1999]. We

revisit the kernel-based perceptual model from halftoning [Sullivan

et al

.

1991;Analoui and Allebach 1992;Pappas and Neuho 1999]

and adapt it to rendering. The resulting energy can be directly used

for optimizing Monte Carlo error distribution without the need for a

mask. This formulation help us expose the underlying assumptions

of existing methods and quantify their limitations. In summary:

●

We formulate an optimization problem for rendering error by

leveraging kernel-based perceptual models from halftoning.

●

Our formulation unies prior blue-noise error distribution meth-

ods and makes all their assumptions explicit, outlining general

guidelines for devising new methods in a principled manner.

●

Unlike prior methods, our formulation simultaneously optimizes

for both the magnitude and the image distribution of pixel error.

●

We devise four dierent practical algorithms based on iterative

minimization, error diusion, and dithering from halftoning.

●We demonstrate substantial visual improvements over prior art,

while using the same input rendering data.

2 RELATED WORK

Our work focuses on reducing and optimizing the distribution of

Monte Carlo pixel-estimation error. In this section we review prior

work with similar goals in digital halftoning (Section 2.1) and image

synthesis guided by energy-based (Section 2.2) and perception-based

(Section 2.3) error metrics. We achieve error reduction through care-

ful sample placement and processing, and discuss related rendering

approaches (Section 2.4).

2.1 Digital haloning

Digital halftoning [Lau and Arce 2007] involves creating the illu-

sion of continuous-tone images through the arrangement of binary

elements; various algorithms target dierent display devices. Bayer

[1973] developed the widely used dispersed-dot ordered-dither pat-

terns. Allebach and Liu [1976] introduced the use of randomness in

clustered-dot ordered dithering. Ulichney [1987] introduced blue-

noise patterns that yield better perceptual quality, and Mitsa and

Parker [1991] mimicked those patterns to produce dither arrays (i.e.,

masks) with high-frequency characteristics. Sullivan et al

.

[1991]

developed a Fourier-domain energy function to obtain visually opti-

mal halftone patterns; the optimality is dened w.r.t. computational

models of the human visual system. Analoui and Allebach [1992]

devised a practical algorithm for blue-noise dithering through a

spatial-domain interpretation of Sullivan et al

.

’s model. Their ap-

proach was later rened by Pappas and Neuho [1999].

The void-and-cluster algorithm [Ulichney 1993] uses a Gaussian

kernel to create dither masks with isotropic blue-noise distribu-

tion. This approach has motivated various structure-aware halfton-

ing algorithms in graphics [Ostromoukhov 2001;Pang et al

.

2008;

Chang et al

.

2009]. In the present work, we leverage the kernel-based

model [Analoui and Allebach 1992;Pappas and Neuho 1999] in

the context of Monte Carlo rendering [Kajiya 1986].

2.2 antitative error assessment in rendering

It is convenient to measure the error of a rendered image as a single

value; vector norms like the mean squared error (MSE) are most

commonly used. However, it is widely acknowledged that such sim-

ple metrics do not accurately reect visual quality as they ignore the

perceptually important spatial arrangement of pixels. Various theo-

retical frameworks have been developed in the spatial [Niederreiter

1992;Kuipers and Niederreiter 1974] and Fourier [Singh et al

.

2019]

domains to understand the error reported through these metrics.

The error spectrum ensemble [Celarek et al

.

2019] measures the

frequency-space distribution of the error.

Many denoising methods [Zwicker et al

.

2015] employ the afore-

mentioned metrics to obtain noise-free results from noisy renderings.

Even if the most advanced denoising techniques driven by such met-

rics can eciently steer adaptive sampling [Chaitanya et al

.

2017;

Kuznetsov et al

.

2018;Kaplanyan et al

.

2019], they locally determine

the number of samples per pixel, ignoring the aspect of their specic

layout in screen space.

Our optimization framework employs a perceptual MSE-based

metric that accounts for both the magnitude and the spatial distri-

bution of pixel-estimation error. We argue that the spatial sample

layout plays a crucial role in the perception of a rendered image;

the most commonly used error metrics do not capture this aspect.

2.3 Perceptual error assessment in rendering

The study of the human visual system (HVS) is still ongoing, and well

understood are mostly the early stages of the visual pathways from

the eye optics, through the retina, to the visual cortex. This limits

the scope of existing HVS computational models used in imaging

and graphics. Such models should additionally be computationally

ecient and generalize over the simplistic stimuli that have been

used in their derivation through psychophysical experiments.

Contrast sensitivity function. The contrast sensitivity function

(CSF) is one of the core HVS models that fullls the above con-

ditions and comprehensively characterizes overall optical [Wes-

theimer 1986;Deeley et al

.

1991] and neural [Souza et al

.

2011]

processes in detecting contrast visibility as a function of spatial

frequency. While originally modeled as a band-pass lter [Barten

1999;Daly 1992], the CSF’s shape changes towards a low-pass lter

with retinal eccentricity [Robson and Graham 1981;Peli et al

.

1991]

and reduced luminance adaptation in scotopic and mesopic levels

[Wuerger et al

.

2020]. Low-pass characteristics are also inherent

for chromatic CSFs [Mullen 1985;Wuerger et al

.

2020;Bolin and

Meyer 1998]. In many practical imaging applications, e.g., JPEG com-

pression [Rashid et al

.

2005], rendering [Ramasubramanian et al

.

1999], or halftoning [Pappas and Neuho 1999], the CSF is modeled

as a low-pass lter, which also allows for better control of image

intensity. By normalizing such a CSF by the maximum contrast-

sensitivity value, a unitless function akin to the modulation transfer

function (MTF) can be derived [Daly 1987;Mannos and Sakrison

1974;Mantiuk et al

.

2005;Sullivan et al

.

1991;Souza et al

.

2011] that

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:3

after transforming from the frequency to the spatial domain results

in the point spread function (PSF) [Analoui and Allebach 1992;Pap-

pas and Neuho 1999]. Following Pappas and Neuho [1999], we

approximate such a PSF by a Gaussian lter; the resulting error is

practically negligible for a pixel density of 300 dots per inch (dpi)

and observer-to-screen distance larger than 60 cm.

Advanced quality metrics. More costly, and often less robust, mod-

eling of the HVS beyond the CSF is performed in advanced quality

metrics [Lubin 1995;Daly 1992;Mantiuk et al

.

2011]. Such metrics

have been adapted to rendering to guide the computation to image

regions where the visual error is most strongly perceived [Bolin

and Meyer 1995,1998;Ramasubramanian et al

.

1999;Ferwerda et al

.

1996;Myszkowski 1998;Volevich et al

.

2000]. An important applica-

tion is visible noise reduction in path tracing via content-adaptive

sample-density control [Bolin and Meyer 1995,1998;Ramasubrama-

nian et al

.

1999]. Our framework enables signicant reduction of

noise visibility for the same sampling budget.

2.4 Blue-noise error distribution in rendering

Mitchell [1991] rst observed that high-frequency error distribution

is desirable for stochastic rendering. Only recently, Georgiev and

Fajardo [2016] adopted techniques from halftoning to correlate pixel

samples in screen space and distribute path-tracing error as blue

noise, with substantial perceptual quality improvements. Heitz et al

.

[2019] built on this idea to develop a progressive quasi-Monte Carlo

sampler that further improves quality. Ahmed and Wonka [2020]

proposed a technique to coordinate quasi-Monte Carlo samples in

screen space inspired by error diusion.

Motivated by the results of Georgiev and Fajardo [2016], Heitz

and Belcour [2019] devised a method to directly optimize the distri-

bution of pixel estimates, without operating on individual samples.

Their pixel permutation strategy ts the initially white-noise pixel

intensities to a prescribed blue-noise mask. This approach scales

well with sample count and dimension, though its reliance on prior

pixel estimates makes it practical only for animation rendering

where it is susceptible to quality degradation.

We propose a perceptual error framework that unies these two

general approaches, exposing the assumptions of existing methods

and providing guidelines to alleviate some of their drawbacks.

3 PERCEPTUAL ERROR MODEL

Our aim is to produce Monte Carlo renderings that, at a xed sam-

pling rate, are perceptually as close to the ground truth as possible.

This goal requires formalizing the perceptual image error along

with an optimization problem that minimizes it. In this section, we

build a perceptual model upon the extensive studies done in the

halftoning literature. We will discuss how to eciently solve the

resulting optimization problem in Section 4.

Given a ground-truth image

𝐼

𝐼

𝐼

and its quantized or stochastic

approximation 𝑄

𝑄

𝑄, we denote the (signed) error image by

𝜖

𝜖

𝜖=𝑄

𝑄

𝑄−𝐼

𝐼

𝐼 . (1)

To minimize the error, it is convenient to quantify it as a single

value. A common approach is to take the

ℒ1

,

ℒ2

, or

ℒ∞

norm of the

Image Image spectrum Kernel spectrum Product spectrum

𝜖

𝜖

𝜖wˆ

𝜖

𝜖

𝜖w2ˆ

𝑔

𝑔

𝑔2ˆ

𝑔

𝑔

𝑔2⊙ˆ

𝜖

𝜖

𝜖w2

𝜖

𝜖

𝜖bˆ

𝜖

𝜖

𝜖b2ˆ

𝑔

𝑔

𝑔2ˆ

𝑔

𝑔

𝑔2⊙ˆ

𝜖

𝜖

𝜖b2

Fig. 2. Error images

𝜖

𝜖

𝜖w

and

𝜖

𝜖

𝜖b

with respective white-noise,

ˆ

𝜖

𝜖

𝜖w2

, and blue-

noise,

ˆ

𝜖

𝜖

𝜖b2

, power spectra. For a low-pass kernel

𝑔

𝑔

𝑔

modeling the PSF of the

HVS (here a Gaussian with std. dev.

𝜎=

1), the product of its spectrum

ˆ

𝑔

𝑔

𝑔2

with

ˆ

𝜖

𝜖

𝜖b2

has lower magnitude than the product with

ˆ

𝜖

𝜖

𝜖w2

. This corre-

sponds to lower perceptual sensitivity to

𝜖

𝜖

𝜖b

, even though

𝜖

𝜖

𝜖w

has the same

amplitude as it is obtained by randomly permuting the pixels of 𝜖

𝜖

𝜖b.

image

𝜖

𝜖

𝜖

interpreted as a vector. Such simple metrics are permutation-

invariant, i.e., they account for the magnitudes of individual pixel

errors but not for their distribution over the image. This distribu-

tion is an important factor for the perceived delity, since contrast

perception is an inherently spatial characteristic of the HVS (Sec-

tion 2.3). Our model is based on perceptual halftoning metrics that

capture both the magnitude and the distribution of error.

3.1 Motivation

Halftoning metrics model the processing done by the HVS as a

convolution of the error image 𝜖

𝜖

𝜖with a kernel 𝑔

𝑔

𝑔:

𝐸=𝑔

𝑔

𝑔∗𝜖

𝜖

𝜖2

2=ˆ

𝑔

𝑔

𝑔⊙ˆ

𝜖

𝜖

𝜖2

2=ˆ

𝑔

𝑔

𝑔2

,ˆ

𝜖

𝜖

𝜖2.(2)

The convolution is equivalent to the element-wise product of the

corresponding Fourier spectra

ˆ

𝑔

𝑔

𝑔

and

ˆ

𝜖

𝜖

𝜖

, whose 2-norm in turn equals

the inner product of the power spectra images

ˆ

𝑔

𝑔

𝑔2

and

ˆ

𝜖

𝜖

𝜖2

.Sullivan

et al

.

[1991] optimized the error image

𝜖

𝜖

𝜖

to minimize the error

(2)

w.r.t. a kernel

𝑔

𝑔

𝑔

that approximates the HVS’s modulation transfer

function

ˆ

𝑔

𝑔

𝑔

(MTF) [Daly 1987]. Analoui and Allebach [1992] used

a similar model in the spatial domain with a kernel that approxi-

mates the PSF

1

of the human eye. That kernel is low-pass, and the

optimization naturally yields blue-noise

2

distribution in the error

image [Analoui and Allebach 1992], as we show later in Fig. 5. The

blue-noise distribution can thus be seen as byproduct of the opti-

mization which pushes the spectral components of the error to the

frequencies least visible to the human eye (see Fig. 2).

To better understand the spatial aspects of contrast sensitivity

in the HVS, the MTF is usually modeled over a range of viewing

distances [Daly 1992]. This is done to account for the fact that with

increasing viewer distance, spatial frequencies in the image are

1The MTF is the magnitude of the Fourier transform of the PSF.

2

The term “blue noise” is often used loosely to refer to any isotropic spectrum with

minimal low-frequency content and no concentrated energy spikes.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:4 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

𝜎=0𝜎=0.25 𝜎=0.5𝜎=1

Fig. 3. The appearance of blue noise (le images) converges to a constant im-

age faster than white noise (right images) with increasing observer

distance

,

here emulated via the standard deviation

𝜎

of a Gaussian kernel. We provide

a formal connection between 𝜎and the viewing distance in Section 6.

projected to higher spatial frequencies onto the retina. These fre-

quencies eventually become invisible, ltered out by the PSF which

expands its corresponding kernel in image space. We recreate this

experiment to see the impact of distance on the image error. In Fig. 3,

we convolve white- and blue-noise distributions with a Gaussian

kernel of increasing standard deviation corresponding to increasing

observer-to-screen distance. The high-frequency blue-noise distribu-

tion reaches a homogeneous state (where the tone appears constant)

faster compared to the all-frequency white noise. This means that

high-frequency error becomes indiscernible at closer viewing dis-

tances, where the HVS ideally has not yet started ltering out actual

image detail which is typically low- to mid-frequency. In Section 6

we discuss how the kernel’s standard deviation encodes the viewing

distance w.r.t. to the screen resolution.

3.2 Our model

In rendering, the value of each pixel

𝑖

is a light-transport integral.

Point-sampling its integrand with a sample set

𝑆

yields a pixel

estimate

𝑄𝑖(𝑆)

. The signed pixel error is thus a function of the

sample set:

𝜖𝑖(𝑆)=𝑄𝑖(𝑆)−𝐼𝑖

, where

𝐼𝑖

is the reference (i.e., ground-

truth) pixel value. The error of the entire image can be written as

𝜖

𝜖

𝜖(𝑆

𝑆

𝑆)=𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)−𝐼

𝐼

𝐼, (3)

where

𝑆

𝑆

𝑆={𝑆1, . . . , 𝑆𝑁}

is an “image” containing the sample set for

all

𝑁

pixels. With these denitions, we can express the perceptual

error in Eq. (2) for the case of Monte Carlo rendering as a function

of the sample-set image 𝑆

𝑆

𝑆, given a kernel 𝑔

𝑔

𝑔:

𝐸(𝑆

𝑆

𝑆)=𝑔

𝑔

𝑔∗𝜖

𝜖

𝜖(𝑆

𝑆

𝑆)2

2.(4)

Our goal is to minimize the perceptual error

(4)

. We formulate

this task as an optimization problem:

min

𝑆

𝑆

𝑆∈Ω

Ω

Ω𝐸(𝑆

𝑆

𝑆)=min

𝑆

𝑆

𝑆∈Ω

Ω

Ω𝑔

𝑔

𝑔∗(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)−𝐼

𝐼

𝐼)2

2.(5)

The minimizing sample-set image

𝑆

𝑆

𝑆

yields an image estimate

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

that is closest to the reference

𝐼

𝐼

𝐼

w.r.t. the kernel

𝑔

𝑔

𝑔

. The search space

Ω

Ω

Ω

is the set of all possible locations for every sample of every pixel.

The total number of samples in

𝑆

𝑆

𝑆

is typically bounded by a given

target sampling budget. Practical considerations may also restrict

the search space Ω

Ω

Ω, as we will exemplify in the following section.

Note that the classical MSE metric corresponds to using a zero-

width (i.e., one-pixel) kernel

𝑔

𝑔

𝑔

in Eq. (4). However, the MSE accounts

only for the magnitude of the error

𝜖

𝜖

𝜖

, while using wider kernels

(such as the PSF) accounts for both magnitude and distribution. Con-

sequently, while the MSE can be minimized by optimizing pixels

independently, minimizing the perceptual error requires coordina-

tion between pixels. In the following section, we devise strategies

for solving this optimization problem.

4 DISCRETE OPTIMIZATION

In our optimization problem

(5)

, the search space for each sample

in every pixel is a high-dimensional unit hypercube. Every point

in this so-called primary sample space maps to a light-transport

path in the scene [Pharr et al

.

2016]. Optimizing for the sample-set

image

𝑆

𝑆

𝑆

thus entails evaluating the contributions

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

of all corre-

sponding paths. This evaluation is costly, and for any non-trivial

scene,

𝑄

𝑄

𝑄

is a function with complex shape and many discontinuities.

This precludes us from studying all (uncountably innite) sample

locations in practice.

To make the problem tractable, we restrict the search in each

pixel to a nite number of (pre-dened) sample sets. We devise

two variants of the resulting discrete optimization problem, which

dier in their denition of the global search space

Ω

Ω

Ω

. In the rst

variant, each pixel has a separate list of sample sets to choose from

(“vertical” search space). The setting is similar to that of (multi-

tone) halftoning [Lau and Arce 2007], which allows us to import

classical optimization techniques from that eld, such as iterative

minimization, error diusion, and mask-based dithering. In the sec-

ond variant, each pixel has one associated sample set, and the search

space comprises permutations of these assignments (“horizontal”

search space). We develop a greedy iterative optimization method

for this second variant.

In contrast to halftoning, in our setting the ground-truth image

𝐼

𝐼

𝐼

—required to compute the error image

𝜖

𝜖

𝜖

during optimization—is

not readily available. Below we describe our algorithms assuming

the ground truth is available; in Section 5we will discuss how to

substitute it with a surrogate to make the algorithms practical.

4.1 Vertical search space

Our rst variant considers a “vertical” search space where the sample

set for each of the 𝑁image pixels is one of 𝑀given sets:3

Ω

Ω

Ω={𝑆

𝑆

𝑆={𝑆1, . . . , 𝑆𝑁}∶𝑆𝑖∈{𝑆𝑖,1, . . . , 𝑆𝑖,𝑀 }}.(6)

The objective is to nd a sample set

𝑆𝑖

for every pixel

𝑖

such that all

resulting pixel estimates together minimize the perceptual error

(4)

.

𝑂1

𝑂1

𝑄1,𝑀

𝑄1,𝑀

𝑂2

𝑂2

𝑄2,1

𝑄2,1

𝑂3

𝑂3

𝑄3,2

𝑄3,2

This is equivalent to directly

optimizing over the

𝑀

pos-

sible estimates

𝑄𝑖,1, . . . , 𝑄𝑖 ,𝑀

for each pixel, with

𝑄𝑖, 𝑗 =

𝑄𝑖(𝑆𝑖,𝑗 )

. These estimates can

be obtained by pre-rendering

a stack of

𝑀

images

𝑄

𝑄

𝑄𝑗={𝑄1,𝑗 , . . . , 𝑄 𝑁, 𝑗 }

, for

𝑗=

1

..𝑀

. The result-

ing minimization problem reads:

min

𝑂

𝑂

𝑂∶𝑂𝑖∈{𝑄𝑖,1,...,𝑄𝑖 ,𝑀 }𝑔

𝑔

𝑔∗(𝑂

𝑂

𝑂−𝐼

𝐼

𝐼)2

2.(7)

3

For notational simplicity, and without loss of generality, we assume that the number

of candidate sample sets 𝑀is the same for all pixels; in practice can vary per pixel.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:5

This problem is almost identical to that of multi-tone halftoning.

The dierence is that in our setting the “quantization levels”, i.e.,

the pixel estimates, are distributed non-uniformly and vary per

pixel as they are not xed but are the result of point-sampling a

light-transport integral. This similarity allows us to directly apply

existing optimization techniques from halftoning. We consider three

such methods, which we outline in Alg. 1and describe next.

Iterative minimization. State-of-the-art halftoning methods attack

the problem

(7)

directly via greedy iterative minimization [Analoui

and Allebach 1992;Pappas and Neuho 1999]. After initializing

every pixel to a random quantization level, we traverse the image in

serpentine order (as is standard practice in halftoning) and for each

pixel choose the level that minimizes the energy. Several full-image

iterations are performed; in our experiments convergence to a local

minimum is achieved within 10–20 iterations.

As a further improvement, the optimization can be terminated

when no pixels are updated within one full iteration, or when the

perceptual-error reduction rate drops below a certain threshold.

Traversing the pixels in random order allows terminating at any

point but converges slightly slower.

Error diusion. A classical halftoning algorithm, error diusion

scans the image pixel by pixel, snapping each reference value to the

closest quantization level and distributing the resulting pixel error

to yet-unprocessed nearby pixels according to a given kernel

𝜅

𝜅

𝜅

. We

use the empirically derived kernel of Floyd and Steinberg [1976]

which has been shown to produce an output that approximately

minimizes Eq. (7) [Hocevar and Niger 2008]. Error diusion is faster

than iterative minimization but yields less optimal solutions.

Dithering. The fastest halftoning approach quantizes pixel values

using thresholds stored in a pre-computed dither mask (or matrix)

[Spaulding et al

.

1997]. For each pixel, the two quantization levels

that tightly envelop the reference value (in terms of brightness) are

found, and one of the two is chosen based on the threshold assigned

to the pixel by the mask.

Dithering can be understood as performing the perceptual error

minimization in two steps. First, an oine optimization encodes the

error distribution optimal for the target kernel

𝑔

𝑔

𝑔

into a mask. Then,

for a given image, the error magnitude is minimized by restricting

the quantization to the two closest levels per pixel, and the mask-

driven choice between them applies the target distribution of error.

4.2 Horizontal search space

We now describe the second, “horizontal” discrete variant of our

minimization formulation

(5)

. It considers a single sample set

𝑆𝑖

assigned to each of the

𝑁

pixels, all represented together as a sample-

set image

𝑆

𝑆

𝑆

. The search space comprises all possible permutations

Π(𝑆

𝑆

𝑆)of these assignments:

Ω

Ω

Ω=Π(𝑆

𝑆

𝑆),with 𝑆

𝑆

𝑆={𝑆1, . . . , 𝑆𝑁}.(8)

The goal is to nd a permutation

𝜋(𝑆

𝑆

𝑆)

that minimizes the perceptual

error (4). The optimization problem (5) thus takes the form

min

𝜋∈Π(𝑆

𝑆

𝑆)𝑔

𝑔

𝑔∗(𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆))−𝐼

𝐼

𝐼)2

2.(9)

Algorithm 1. Three algorithms to (approximately) solve the vertical search

space optimization problem

(7)

. The output is an image

𝑂

𝑂

𝑂={𝑂1, . . . , 𝑂𝑁}

,

given a reference image

𝐼

𝐼

𝐼

and a stack of initial image estimates

𝑄

𝑄

𝑄1, . . . , 𝑄

𝑄

𝑄𝑀

.

Iterative minimization updates pixels repeatedly, for each selecting the

estimate that minimizes the perceptual error

(4)

. Error diusion quantizes

each pixel to the closest estimate, distributing the error to its neighbors

based on a kernel

𝜅

𝜅

𝜅

. Dithering quantizes each pixel in

𝐼

𝐼

𝐼

based on thresholds

looked up in a dither mask 𝐵

𝐵

𝐵(optimized w.r.t. the kernel 𝑔

𝑔

𝑔).

1:function IterativeMinimization(𝑔

𝑔

𝑔,𝐼

𝐼

𝐼,𝑄

𝑄

𝑄1, ...,𝑄

𝑄

𝑄𝑀,𝑂

𝑂

𝑂,𝑇)

2:𝑂

𝑂

𝑂={𝑄1,rand, . . . , 𝑄 𝑁,rand }←Init each pixel to random estimate

3:for 𝑇iterations do

4:for pixel 𝑖=1..𝑁 do ←E.g., random or serpentine order

5:for estimate 𝑄𝑖, 𝑗 ∈{𝑄𝑖,1, . . . , 𝑄𝑖,𝑀 }do

6:if 𝑂𝑖==𝑄𝑖, 𝑗 reduces 𝑔

𝑔

𝑔∗(𝑂

𝑂

𝑂−𝐼

𝐼

𝐼)2

2then

7:𝑂𝑖=𝑄𝑖, 𝑗 ←Update estimate

8:function ErrorDiffusion(𝜅

𝜅

𝜅,𝐼

𝐼

𝐼,𝑄

𝑄

𝑄1, ...,𝑄

𝑄

𝑄𝑀,𝑂

𝑂

𝑂)

9:𝑂

𝑂

𝑂=𝐼

𝐼

𝐼←Initialize solution to reference

10:for pixel 𝑖=1..𝑁 do ←E.g., serpentine order

11:𝑂old

𝑖=𝑂𝑖

12:𝑂𝑖∈arg min𝑄𝑖,𝑗 𝑂old

𝑖−𝑄𝑖, 𝑗 2

2

13:𝜖𝑖=𝑂old

𝑖−𝑂𝑖ÆDiuse error 𝜖𝑖to yet-unprocessed neighbors

14:for unprocessed pixel 𝑘within support of 𝜅

𝜅

𝜅around 𝑖do

15:𝑂𝑘+=𝜖𝑖⋅𝜅𝑘−𝑖

16:function Dithering(𝐵

𝐵

𝐵,𝐼

𝐼

𝐼,𝑄

𝑄

𝑄1, ...,𝑄

𝑄

𝑄𝑀,𝑂

𝑂

𝑂)

17:for pixel 𝑖=1..𝑁 do ÆFind tightest interval [𝑄low

𝑖, 𝑄high

𝑖]

18:𝑄lower

𝑖=arg max𝑄𝑖,𝑗 ∶𝑄𝑖, 𝑗 ≤𝐼𝑖𝑄𝑖,𝑗 containing 𝐼𝑖

19:𝑄upper

𝑖=arg min 𝑄𝑖,𝑗 ∶𝑄𝑖,𝑗 >𝐼𝑖𝑄𝑖, 𝑗

20:if 𝐼𝑖−𝑄lower

𝑖<𝐵𝑖⋅𝑄upper

𝑖−𝑄low

𝑖then

21:𝑂𝑖=𝑄lower

𝑖

Ä

Set 𝑂𝑖to 𝑄lower

𝑖or 𝑄upper

𝑖using threshold 𝐵𝑖

22:else

23:𝑂𝑖=𝑄upper

𝑖

We can explore the permutation space

Π(𝑆

𝑆

𝑆)

by swapping the sample-

set assignments between pixels. The minimization requires

updating the image estimate

𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆))

for each

permutation

𝜋(𝑆

𝑆

𝑆)

, i.e., after every swap. Such

updates are costly as they involve re-sampling

both pixels in each of potentially millions of

swaps. We need to eliminate these extra ren-

dering invocations during the optimization to

make it practical. To that end, we observe that

for pixels solving similar light-transport integrals, swapping their

sample sets gives a similar result to swapping their estimates. We

therefore restrict the search space to permutations that can be gen-

erated through swaps between such (similar) pixels. This enables an

ecient optimization scheme that directly swaps the pixel estimates

of an initial rendering 𝑄

𝑄

𝑄(𝑆

𝑆

𝑆).

Error decomposition. Formally, we express the estimate produced

by a sample-set permutation in terms of permuting the pixels of the

initial rendering:

𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆))=𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))+Δ

Δ

Δ(𝜋)

. The error

Δ

Δ

Δ

is zero

when the swapped pixels solve the same integral. Substituting into

Eq. (9), we can approximate the perceptual error by (see Appendix A)

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:6 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

𝐸(𝜋)=𝑔

𝑔

𝑔∗(𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))−𝐼

𝐼

𝐼+Δ

Δ

Δ(𝜋))2

2(10a)

≈𝑔

𝑔

𝑔∗(𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))−𝐼

𝐼

𝐼)2

2+𝑔

𝑔

𝑔2

1

𝑖

𝑑(𝑖, 𝜋 (𝑖))=𝐸𝑑(𝜋),(10b)

where we write the error

𝐸(𝜋)

as a function of

𝜋

only, to emphasize

that everything else is xed during the optimization. In the approxi-

mation

𝐸𝑑

, the term

𝑑(𝑖, 𝜋 (𝑖))

measures the dissimilarity between

pixel

𝑖

and the pixel

𝜋(𝑖)

it is relocated to by the permutation. The

purpose of this metric is to predict how dierent we expect the

result of re-estimating the pixels after swapping their sample sets

to be compared to simply swapping their initial estimates. It can be

constructed based on knowledge or assumptions about the image.

Local similarity assumption. Our implementation uses a simple

binary dissimilarity function that returns zero when

𝑖

and

𝜋(𝑖)

are

within some distance

𝑟

and innity otherwise. We set

𝑟∈[

1

,

3

]

; it

should ideally be locally adapted to the image smoothness. This

allows us to restrict the search space

Π(𝑆

𝑆

𝑆)

only to permutations that

swap adjacent pixels where it is more likely that

Δ

Δ

Δ

is small. More

elaborate heuristics could better account for pixel (dis)similarity.

Iterative minimization. We devise a greedy iterative minimization

scheme for this horizontal formulation, similar to the one in Alg. 1.

Given an initial image estimate

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

, produced by randomly assign-

ing a sample set to every pixel, our algorithm goes over all pixels

and for each considers swaps within a

(

2

𝑅+

1

)2

neighborhood;

we use

𝑅=

1. The swap that brings the largest reduction in the

perceptual error

𝐸𝑑

is accepted. Algorithm 2provides pseudocode.

In our experiments we run

𝑇=

10 full-image iterations. As before,

the algorithm could be terminated based on the swap reduction rate

or the error reduction rate. We explore additional optimizations in

supplemental Section 3.

The parameter 𝑅balances between the cost of one iteration and

the amount of exploration it can do. Note that this parameter is

dierent from the maximal relocation distance

𝑟

in the dissimilarity

metric, with 𝑅≤𝑟.

Due to the pixel (dis)similarity assumptions, the optimization

can produce some mispredictions, i.e., it may swap the estimates of

pixels for which swapping the sample sets produces a signicantly

dierent result. Thus the image

𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))

cannot be used directly

as a nal estimate. We therefore re-render the image using the

optimized permutation 𝜋to obtain the nal estimate 𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆)).

4.3 Discussion

Search space. We discretize the search space Ω

Ω

Ωto make the opti-

mization problem

(5)

tractable. To make it truly practical, it is also

necessary to avoid repeated image estimation (i.e.,

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

evaluation)

during the search for the solution

𝑆

𝑆

𝑆

. Our vertical

(7)

and horizon-

tal

(9)

optimization variants are formulated specically with this

goal in mind. All methods in Algs. 1and 2operate on pre-generated

image estimates that constitute the solution search space.

Our vertical formulation takes a collection of

𝑀

input estimates

{𝑄𝑖,𝑗 =𝑄𝑖(𝑆𝑖, 𝑗 )}𝑀

𝑗=1

for every pixel

𝑖

, one for each sample set

𝑆𝑖, 𝑗

.

Noting that

𝑄𝑖, 𝑗

are MC estimates of the true pixel value, this col-

lection can be cheaply expanded to a size as large as 2

𝑀−

1by

Algorithm 2. Given a convolution kernel

𝑔

𝑔

𝑔

, a reference image

𝐼

𝐼

𝐼

, an initial

sample-set assignment

𝑆

𝑆

𝑆

, and an image estimate

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

computed with

that assignment, our greedy algorithm iteratively swaps sample-set assign-

ments between neighboring pixels to minimize the perceptual error

𝐸𝑑(10b)

,

producing a permutation 𝜋of the initial assignment.

1:function IterativeMinimization(𝑔

𝑔

𝑔,𝐼

𝐼

𝐼,𝑆

𝑆

𝑆,𝑄

𝑄

𝑄(𝑆

𝑆

𝑆),𝑇,𝑅,𝜋)

2:𝜋=identity permutation ←Initialize solution permutation

3:for 𝑇iterations do

4:for pixel 𝑖=1..𝑁 do ←E.g., random or serpentine order

5:𝜋′=𝜋ÆFind best pixel in neighborhood to swap with

6:for pixel 𝑗in (2𝑅+1)2neighborhood around 𝑖do

7:if 𝐸𝑑(𝜋𝑖⇆𝑗(𝑆

𝑆

𝑆))<𝐸𝑑(𝜋′(𝑆

𝑆

𝑆))then ←Eq. (10b)

8:𝜋′=𝜋𝑖⇆𝑗←Accept swap as current best

9:𝜋=𝜋′

taking the average of the estimates in each of its subsets (excluding

the empty subset). In practice only a fraction of these subsets can

be used, since the size of the power set grows exponentially with

𝑀

. It may seem that this approach ends up wastefully throwing

away most input estimates. But note that these estimates actively

participate in the optimization and provide the space of possible

solutions. Carefully selecting a subset per pixel can yield a higher-

delity result than blindly averaging all available estimates, as we

will show repeatedly in Section 7.

In contrast, our horizontal formulation builds a search space given

just a single input estimate

𝑄𝑖

per pixel. We consciously restrict

the space to permutations between nearby pixels, so as to leverage

local pixel similarity and avoid repeated pixel evaluation during

optimization. The disadvantage of this approach is that it requires re-

rendering the image after optimization, with uncertain results (due

to mispredictions) that can lead to local degradation of image quality.

Mispredictions can be reduced by exploiting knowledge about the

rendering function

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

, e.g., through depth, normal, or texture

buers; we explore this in supplemental Section 2. Additionally,

while methods like iterative minimization (Alg. 2) and dithering

(Section 5.2) can be adapted to this search space, reformulating other

halftoning algorithms such as error diusion is non-trivial.

A hybrid formulation is also conceivable, taking a single input

estimate per pixel (like horizontal methods) and considering a sepa-

rate (vertical) search space for each pixel constructed by borrowing

estimates from neighboring pixels. Such an approach could benet

from advanced halftoning optimization methods, but could also

suer from mispredictions and require re-rendering. We leave the

exploration of this approach to future work.

Finally, it is worth noting that discretization is not the only route

to practicality. Equation (5) can be optimized on the continuous

space

Ω

Ω

Ω

if some cheap-to-evaluate proxy for the rendering function

is available. Such a continuous approximation may be analytical

(based on prior knowledge or assumptions) or obtained by recon-

structing a point-wise evaluation. However, continuous-space opti-

mization can be dicult in high dimensions (e.g., number of light

bounces) where non-linearities and non-convexity are exacerbated.

Optimization strategy. Another important choice is the optimiza-

tion method. For the vertical formulation, iterative minimization

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:7

provides the best exibility and quality but is the most computa-

tionally expensive. Error diusion and dithering are faster but only

approximately solve Eq. (7).

One dierence between classical halftoning and our vertical set-

ting is that quantization levels are non-uniformly distributed and

dier between pixels. This further increases the gap in quality be-

tween the image-adaptive iterative minimization and error diu-

sion (which can correct for these dierences) and the non-adaptive

dithering, compared to the halftoning setting. The main advantage

of dithering is that it involves the kernel

𝑔

𝑔

𝑔

explicitly, while the

error-diusion kernel 𝜅

𝜅

𝜅cannot be related directly to 𝑔

𝑔

𝑔.

5 PRACTICAL APPLICATION

We now turn to the practical use of our error optimization frame-

work. In both our discrete formulations from Section 4, the search

space is determined by a given collection of sample sets

𝑆𝑖, 𝑗

for

every pixel

𝑖

, with

𝑗=

1

...𝑀

(in the horizontal setting

𝑀=

1). The

optimization is then driven by the corresponding estimates

𝑄𝑖, 𝑗

. We

consider two ways to obtain these estimates, leading to dierent

practical trade-os: (1) direct evaluation of the samples by rendering

a given scene and (2) using a proxy for the rendering function. We

show how prior works correspond to using either approach within

our framework, which helps expose their implicit assumptions.

5.1 Surrogate for ground truth

The goal of our optimization is to perceptually match an image

estimate to the ground truth

𝐼

𝐼

𝐼

as closely as possible. Unfortunately,

the ground truth is unknown in our setting, unlike in halftoning.

The best we can do is substitute it with a surrogate image

𝐼′

𝐼′

𝐼′

. Such

an image can be obtained either from available pixel estimates or

by making assumptions about the ground truth. We will discuss

specic approaches in the following, but it is already worth noting

that all existing error-distribution methods rely on such a surrogate,

whether explicitly or implicitly. And since the surrogate guides the

optimization, its delity directly impacts the delity of the output.

5.2 A-posteriori optimization

Given a scene and a viewpoint, initial pixel estimates can be obtained

by invoking the renderer with the input samples:

𝑄𝑖, 𝑗 =𝑄𝑖(𝑆𝑖, 𝑗 )

.

A surrogate can then be constructed from those estimates; in our

implementation we denoise the estimate-average image (Section 7.1).

Having the estimates and the surrogate, we can run any of the

methods in Algs. 1and 2. Vertical algorithms directly output an

image

𝑂

𝑂

𝑂

; horizontal optimization yields a sample-set image

𝑆

𝑆

𝑆

that

requires an additional rendering invocation: 𝑂

𝑂

𝑂=𝑄

𝑄

𝑄(𝑆

𝑆

𝑆).

This general approach of utilizing sampled image information

was coined a-posteriori optimization by Heitz and Belcour [2019];

they proposed two such methods. Their histogram method operates

in a vertical setting, choosing one of the (sorted) estimates for each

pixel based on the respective value in a given blue-noise dither mask.

Such sampling corresponds to using an implicit surrogate that is

the median estimate for every pixel, which is what the mean of the

dither mask maps to. Importantly, any one of the estimates for a pixel

can be selected, whereas in classical dithering the choice is between

the two quantization levels that tightly envelop the reference value

(Section 4.1) [Spaulding et al

.

1997]. Such selection can yield large

error, especially for pixels whose corresponding mask values deviate

strongly from the mask mean. This produces image reies that do

not appear if simple estimate averages are taken instead (see Fig. 6).

The permutation method of Heitz and Belcour [2019] operates in

a horizontal setting. Given an image estimate, it nds pixel permu-

tations within small tiles that minimize the distance between the

estimates and the values of a target blue-noise mask. This matching

transfers the mask’s distribution to the image signal rather than

to its error. The two are equivalent only when the signal within

each tile is constant. The implicit surrogate in this method is thus

a tile-wise constant image (shown more formally in supplemental

Section 5). In our framework the use of a surrogate is explicit, which

enables full control over the quality of the error distribution.

5.3 A-priori optimization

Optimizing perceptual error is possible even in the absence of in-

formation about a specic image. In our framework, the goal of

such an a-priori approach (as coined by Heitz and Belcour [2019]) is

to compute a sample-set image

𝑆

𝑆

𝑆

by using surrogates for both the

ground-truth image

𝐼

𝐼

𝐼

and the rendering function

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

, constructed

based on smoothness assumptions. The samples

𝑆

𝑆

𝑆

can then produce

a high-delity estimate of any image that meets those assumptions.

Lacking prior knowledge, one could postulate that every pixel

𝑖

has the same rendering function:

𝑄𝑖(⋅)=𝑄(⋅)

; the image surrogate

𝐼′

𝐼′

𝐼′

is thus constant. While in practice this assumption (approximately)

holds only locally, the optimization kernel

𝑔

𝑔

𝑔

is also expected to have

compact support. The shape of

𝑄

can be assumed to be (piecewise)

smooth and approximable by a cheap analytical function 𝑄′.

With the above surrogates in place, we can run our algorithms

to optimize a sample-set image

𝑆

𝑆

𝑆

. The constant-image assumption

makes horizontal algorithms well-suited for this setting as it makes

the swapping-error term

Δ

Δ

Δ

in Eq. (10a) vanish, simplifying the per-

ceptual error to

𝐸(𝜋(𝑆

𝑆

𝑆))=𝑔

𝑔

𝑔∗𝜋(𝜖

𝜖

𝜖(𝑆

𝑆

𝑆))2

2

. This enables the opti-

mization to consider swaps between any two pixels in the error

image

𝜖

𝜖

𝜖(𝑆

𝑆

𝑆)

. That image can be quickly rendered in advance, by in-

voking the render-function surrogate

𝑄′

with the input sample-set

image.

Georgiev and Fajardo [2016] take a similar approach, with swap-

ping based on simulated annealing. Their empirically motivated

optimization energy uses an explicit (Gaussian) kernel, but instead

of computing an error image through a rendering surrogate, it pos-

tulates that the distance between two sample sets is representative

of the dierence between their corresponding pixel errors. Such

a smoothness assumption holds for bi-Lipschitz-continuous func-

tions. Their energy can thus be understood to compactly encode

properties of a class of rendering functions.

Heitz et al

.

[2019] adopt the approach of Georgiev and Fajardo

[2016], but their energy function replaces the distance between

sample sets by the dierence between their corresponding pixel

errors. The errors are computed using an explicit render-function

surrogate. They optimize for a large number of simple surrogates

simultaneously, and leverage a compact representation of Sobol

sequences to also support progressive sampling. We relate these two

prior works to ours more formally in supplemental Section 6, also

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:8 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

showing how our perceptual error formulation can be incorporated

into the method of Heitz et al. [2019].

The approach of Ahmed and Wonka [2020] performs on-the-

y scrambling of a Sobol sequence applied to the entire image.

Image pixels are visited in Morton Z-order modied to breaks its

regularity. The resulting sampler diuses Monte Carlo error over

hierarchically nested blocks of pixels giving a perceptually pleasing

error distribution. However, the algorithmic nature of this approach

introduces more implicit assumptions than prior works, making it

dicult to analyze.

Our theoretical formulation and optimization methods enable the

construction of a-priori sampling masks in a principled way. For hor-

izontal optimization, we recommend using our iterative algorithm

(Alg. 2) which can bring signicant performance improvement over

simulated annealing; such speed-up was reported by Analoui and

Allebach [1992] for dither-mask construction. Vertical optimization

is an interesting alternative, where for each pixel one of several

sample sets would be chosen; this would allow for varying the

sample count per pixel. Note that the ranking-key optimization

for progressive sampling of Heitz et al

.

[2019] is a form of vertical

optimization.

5.4 Discussion

Our formulation expresses a-priori and a-posteriori optimization un-

der a common framework that unies existing methods. These two

approaches come with dierent trade-os. A-posteriori optimization

utilizes sampled image information, and in a vertical setting requires

no assumptions except for what is necessary for surrogate construc-

tion. It thus has potential to achieve high output delity, especially

on scenes with complex lighting as it is oblivious to the shape and

dimensionality of the rendering function, as rst demonstrated by

Heitz and Belcour [2019]. However, it requires pre-sampling (also

post-sampling in the horizontal setting), and the optimization is

sensitive to the surrogate quality.

Making aggressive assumptions allows a-priori optimization to

be performed oine once and the produced samples

𝑆

𝑆

𝑆

to be subse-

quently used to render any image. This approach resembles classical

sample stratication where the goal is also to optimize sample dis-

tributions w.r.t. some smoothness expectations. In fact, our a-priori

formulation subsumes the per-pixel stratication problem, since

the perceptual error is minimized when the error image

𝜖

𝜖

𝜖(𝑆

𝑆

𝑆)

has

both low magnitude and visually pleasing distribution. Two main

factors limit the potential of a-priori optimization, especially on

scenes with non-uniform multi-bounce lighting. One is the general

diculty of optimizing sample distributions in high-dimensional

spaces. The other is that in such scenes the complex shape of the

rendering function, both in screen and sample space, can easily

break smoothness assumptions and cause high (perceptual) error.

To test the capabilities of our formulation, in the following we

focus on the a-posteriori approach. In the supplemental document

we explore a-priori optimization based on our framework. The two

approaches can also be combined, e.g., by seeding a-posteriori opti-

mization with a-priori-optimized samples whose good initial guess

can improve the quality and convergence speed.

6 EXTENSIONS

Our perceptual error formulation

(4)

approximates the eect of the

HVS PSF through kernel convolution. In this section we analyze the

relationship between kernel and viewing distance, as well as the

impact of the kernel shape on the error distribution. We also present

extensions that account for the HVS non-linearities in handling

tone and color.

Kernels and viewing distance. As discussed in Section 3.1, the PSF

is usually modelled over a range of viewing distances. The eect of

the PSF depends on the frequencies of the viewed signal and the

distance from which it is viewed. Pappas and Neuho [1999] have

found that the Gaussian is a good approximation to the PSF in the

context of halftoning. They derived its standard deviation

𝜎

in terms

of the minimum viewing distance for a given screen resolution:

𝜎=0.00954

𝜏,where 𝜏=180

𝜋2 arctan 1

2𝑅𝐷 .(11)

Here

𝜏

is the visual angle between the centers of two neighboring

pixels (in degrees) for screen resolution

𝑅

(in 1

inches) and viewing

distance

𝐷

(in inches). The minimum viewing distance for a given

standard deviation and resolution can be contained via the inverse

formula:

𝐷=2𝑅tan 𝜋

180 0.00954

2𝜎−1

. Larger

𝜎

values correspond to

larger observer distances; we demonstrate the eect of that in Fig. 3

where the images become increasingly blurrier. In Fig. 4a, we com-

pare that Gaussian kernel to two well-established PSF models from

the halftoning literature [Näsänen 1984;González et al

.

2006]. We

have found the dierences between all three to be negligible; we

use the cheaper to evaluate Gaussian in all our experiments.

Decoupling the viewing distances. Being based on the perceptual

models of the HVS [Sullivan et al

.

1991;Analoui and Allebach 1992],

our formulation

(4)

assumes that the estimate

𝑄

𝑄

𝑄

and the reference

𝐼

𝐼

𝐼

are viewed from the same (range of) distance(s). The two distances

can be decoupled by applying dierent kernels to 𝑄

𝑄

𝑄and 𝐼

𝐼

𝐼:

𝐸=𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼2

2.(12)

Minimizing this error makes

𝑄

𝑄

𝑄

appear from some distance

𝐷𝑔

𝑔

𝑔

similar

to

𝐼

𝐼

𝐼

seen from a dierent distance

𝐷ℎ

ℎ

ℎ

. The special case of using a

Kronecker delta kernel

ℎ

ℎ

ℎ=𝛿

𝛿

𝛿

, i.e., with the reference

𝐼

𝐼

𝐼

seen from

up close, yields

𝐸=𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−𝐼

𝐼

𝐼2

2

. This has been shown to have an

edge enhancing eect [Anastassiou 1989;Pappas and Neuho 1999]

which we show in Fig. 4b. We use ℎ

ℎ

ℎ=𝛿

𝛿

𝛿in all our experiments.

Tone mapping. Considering that the optimized image will be

viewed on media with limited dynamic range (e.g., screen or paper),

we can incorporate a tone-mapping operator

𝒯

into the perceptual

error (4):

𝐸=𝑔

𝑔

𝑔∗𝜖

𝜖

𝜖𝒯2

2=𝑔

𝑔

𝑔∗(𝒯(𝑄

𝑄

𝑄)−𝒯(𝐼

𝐼

𝐼))2

2.(13)

Doing this also bounds the per-pixel error

𝜖

𝜖

𝜖𝒯=𝒯(𝑄

𝑄

𝑄)−𝒯(𝐼

𝐼

𝐼)

,

suppressing outliers and making the optimization more robust in

scenes with high dynamic range. We illustrate this improvement in

Fig. 4c, where an ACES [Arrighetti 2017] tone-mapping operator

is applied to the optimized image. Optimizing w.r.t. the original

perceptual error

(4)

yields a noisy and overly dark image compared

to the tone-mapped ground truth. Accounting for tone mapping in

the optimization through Eq. (13) yields a more faithful result.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:9

[Näsänen 1984][Näsänen 1984] [González et al. 2006][González et al. 2006]

Our kernelOur kernel

ℎ

ℎ

ℎ=𝑔

𝑔

𝑔ℎ

ℎ

ℎ=𝑔

𝑔

𝑔ℎ

ℎ

ℎ=𝛿

𝛿

𝛿ℎ

ℎ

ℎ=𝛿

𝛿

𝛿Linear errorLinear error Tone-mapped errorTone-mapped error

Ground truthGround truth

Grayscale errorGrayscale error Color errorColor error

Ground truthGround truth

(a) Kernel comparison (b) Kernel sharpening eect (c) Tone mapping (ACES) (d) Color handling

Fig. 4. (a) Our binomial Gaussian approximation

𝑔

𝑔

𝑔

(3

×

3pixels,

𝜎=2⇑𝜋

) performs on par with state-of-the-art haloning kernels. (b) Seing the reference-image

kernel ℎ

ℎ

ℎin Eq. (12) to a zero-width 𝛿

𝛿

𝛿kernel sharpens the output. (c) Incorporating tone mapping via Eq. (13). (d) Incorporating color via Eq. (14).

Input (white noise) Low-pass (blue noise) Band-stop (green noise) High-pass (red noise) Band-pass (violet noise) Low-pass anisotropic Spatially varying

Fig. 5. Our formulation

(5)

allows optimizing the error distribution of an image w.r.t. arbitrary kernels. Here we adapt our horizontal iterative minimization (Alg. 2)

to directly swap the pixels of a white-noise input image. Insets show the power spectra of the target kernels (top le) and the optimized images (boom right).

Color handling. While the HVS reacts more strongly to luminance

than color, ignoring chromaticity entirely (e.g., by computing the

error image

𝜖

𝜖

𝜖

from per-pixel luminances) can have a negative eect

on the distribution of color noise in the image. To that end, we can

penalize the perceptual error of each color channel

𝑐∈𝐶

separately:

𝐸=

𝑐∈𝐶

𝜆𝑐𝑔

𝑔

𝑔𝑐∗(𝑄

𝑄

𝑄𝑐−𝐼

𝐼

𝐼𝑐)2

2,(14)

where

𝜆𝑐

is a per-channel weight. In our experiments, we use an

RGB space

𝐶={

r

,

g

,

b

}

, set

𝜆𝑐=

1, and use the same kernel

𝑔

𝑔

𝑔𝑐=𝑔

𝑔

𝑔

for

every channel. Figure 4d shows the improvement in color noise over

using greyscale perceptual error. Depending on the color space, the

per-channel kernels may dier (e.g., YCbCr) [Sullivan et al. 1991].

As an alternative, one could decouple the channels from the input

estimates and optimize each channel separately, assembling the

results into a color image. In a vertical setting, this decoupling

extends the optimization search space size from 𝑀to 𝑀𝐶.

Kernel shape impact. To test the robustness of our framework, we

analyze kernels with spectral characteristics other than isotropic

blue-noise in Fig. 5. We run our iterative pixel-swapping algorithm

(Alg. 2) to optimize the shape of a white-noise input, which pro-

duces a spectral distribution inverse to that of the target kernel. The

rightmost image in the gure shows the result of using a spatially

varying kernel that is a convex combination between a low-pass

Gaussian and a high-pass anisotropic kernel, with the interpolation

parameter varying horizontally across the image. Our algorithm

can adapt the noise shape well.

7 RESULTS

We now present empirical validation of our error optimization frame-

work in the a-posteriori setting described in Section 5.2. We render

static images and animations of several scenes, comparing our algo-

rithms to those of Heitz and Belcour [2019].

7.1 Setup

Perceptual error model. We build a perceptual model by combin-

ing all extensions from Section 6. Our estimate-image kernel

𝑔

𝑔

𝑔

is a

binomial approximation of a Gaussian [Lindeberg 1990]. For per-

formance reasons and to allow smaller viewing distances we use a

3

×

3-pixel kernel with standard deviation

𝜎=2𝜋

(see Fig. 4a).

Plugging this

𝜎

value into the inverse of Eq. (11), the correspond-

ing minimum viewing distance is

𝐷=

4792

𝑅

inches for a screen

resolution of

𝑅

dpi (e.g., 16 inches at 300 dpi). We recommend view-

ing from a larger distance, to reduce the eect of our 3

×

3kernel

discretization. We use a Dirac reference-image kernel:

ℎ

ℎ

ℎ=𝛿

𝛿

𝛿

, and

incorporate a simple tone-mapping operator

𝒯

that clamps pixel

values to [0,1]. The nal error model reads:

𝐸=

𝑐∈{r,g,b}

𝑔

𝑔

𝑔∗𝒯(𝑄

𝑄

𝑄𝑐)−𝛿

𝛿

𝛿∗𝒯(𝐼′

𝐼′

𝐼′

𝑐)2

2,(15)

where

𝐼′

𝐼′

𝐼′

is the surrogate image whose construction we describe

below. For dithering we convert RGB colors to luminance, which

reduces the number of components in the error (15) to one.

Methods. We compare our four methods from Algs. 1and 2to

the histogram and permutation of Heitz and Belcour [2019]. For our

vertical and horizontal iterative minimizations we set the maximum

iteration count to 100 and 10 respectively. For error diusion we

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:10 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

use the kernel of Floyd and Steinberg [1976] and for dithering we

use a void-and-cluster mask [Ulichney 1993]. For our horizontal

iterative minimization we use a search radius

𝑅=

1and allow pixels

to travel within a disk of radius

𝑟=

1from their original location in

the dissimilarity metric. For the permutation method of Heitz and

Belcour [2019] we obtained best results with tile size 8

×

8. (Our

𝑟=

1

approximately corresponds to their tile size 3×3.)

Rendering. All scenes were rendered with PBRT [Pharr et al

.

2016]

using unidirectional or bidirectional path tracing. None of the meth-

ods depend on the sampling dimensionality, though we set the

maximum path depth to 5 for all scenes to maintain reasonable ren-

dering times. The ground-truth images have been generated using

a Sobol sampler with at least 1024 samples per pixel (spp); for all

test renders we use a random sampler. To facilitate numerical-error

comparisons between the dierent methods, we trace the primary

rays through the pixel centers.

Surrogate construction. To build a surrogate image for our meth-

ods, we lter the per-pixel averaged input estimates using Intel Open

Image Denoise [Intel 2018] which also leverages surface-normal and

albedo buers, taking about 0.5 sec for a 512

×

512 image. Recall that

the methods of Heitz and Belcour [2019] utilize implicit surrogates.

Image-quality metrics. We evaluate the quality of some of our re-

sults using the HDR-VDP-2 perceptual metric [Mantiuk et al

.

2011],

with parameters matching our binomial kernel. We compute error-

detection probability maps which indicate the likelihood for a hu-

man observer to notice a dierence from the ground truth.

Additionally, we analyze the local blue-noise quality of the error

image

𝜖

𝜖

𝜖=𝒯(𝑄

𝑄

𝑄)−𝒯(𝐼

𝐼

𝐼)

. We split the image into tiles of 32

×

32

pixels and compute the Fourier power spectrum of each tile. For

visualization purposes, we apply a standard logarithmic transform

𝑐ln(

1

+ˆ

𝜖)

to every resulting pixel value

ˆ

𝜖

and compute the nor-

malization factor

𝑐

per tile so that the maximum nal RGB value

within the tile is

(

1

,

1

,

1

)

. Note that the error image

𝜖

𝜖

𝜖

is computed

w.r.t. the ground truth

𝐼

𝐼

𝐼

and not the surrogate, which quanties

the blue-noise distribution objectively. The supplemental material

contains images of the tiled power spectra for all experiments.

We compare images quantitatively via traditional MSE as well

as a metric derived from our perceptual error formulation. Our

perceptual MSE (pMSE) evaluates the error

(15)

of an estimate image

w.r.t. the ground truth, normalized by the number of pixels

𝑁

and

channels

𝐶

:

pMSE =𝐸

𝑁⋅𝐶

. It generalizes the MSE with a perceptual,

i.e., non-delta, kernel 𝑔

𝑔

𝑔. Table 1summarizes the results.

7.2 Rendering comparisons

All methods. Figure 6shows an equal-sample comparison of all

methods. Vertical methods select one of the 4 input samples per

pixel; horizontal methods are fed a 2-spp average for every pixel, and

another 2 spp are used to render the nal image after optimization.

Our methods consistently perform best visually, with the vertical

iterative minimization achieving the lowest perceptual error, as cor-

roborated by the HDR-VDP-2 detection maps. Error diusion is not

far behind in quality and is the fastest of all methods along with

dithering. The latter is similar to Heitz and Belcour’s histogram

method but yields a notably better result thanks to using a superior

surrogate and performing the thresholding as in the classical halfton-

ing setting (see Section 5.2). Horizontal methods perform worse due

to noisier input data (half spp) and worse surrogates derived from

it, and also mispredictions (which necessitate re-rendering). Ours

still uses a better surrogate than Heitz and Belcour’s permutation

and is also able to better t to it. Notice the low delity of the 4-spp

average image compared to our vertical methods’, even though the

latter retain only one of the four input samples for every pixel.

Vertical methods. In Fig. 7we compare our vertical iterative min-

imization to the histogram sampling of Heitz and Belcour [2019].

Both select one of several input samples (i.e., estimates) for each

pixel. Our method produces a notably better result even when given

16

×

fewer samples to choose from. The perceptual error of his-

togram sampling does not vanish with increasing sample count. It

dithers pixel intensity rather than pixel error, thus more samples

help improve the intensity distribution but not the error magnitude.

Figure 1shows our most capable method: vertical iterative mini-

mization with search space extended to the power set of the input

samples (with size 2

4−

1

=

15 for 4 input spp; see Section 4.3). We

compare surrogate-driven optimization against the best-case result—

optimization w.r.t. the ground truth. Both achieve high delity, with

little dierence between them and with pronounced local blue-noise

error distribution corroborated by the tiled power spectra.

Horizontal methods & animation. For rendering static images,

horizontal methods are at a disadvantage compared to vertical ones

due to the required post-optimization re-rendering. As Heitz and

Belcour [2019] note, in an animation setting this sampling overhead

can be mitigated by reusing the result of one frame as the initial

estimate for the next. In Fig. 8we compare their permutation method

to our horizontal iterative minimization. For theirs we shift a void-

and-cluster mask in screen space per frame and apply retargeting,

and for ours we traverse the image pixels in dierent random order.

We intentionally keep the scenes static to test the methods’ best-case

abilities to improve the error distribution over frames.

Starting from a random initial estimate, our method can benet

from a progressively improving surrogate that helps ne-tune the er-

ror distribution via localized pixel swaps. The permutation method

operates in greyscale within static non-overlapping tiles. This pre-

vents it from making signicant progress after the rst frame. While

mispredictions cause local deviations from blue noise in both re-

sults, these are stronger in the permutation method’s. This is evident

when comparing the corresponding prediction images—the results

of optimization right before re-rendering. The permutation’s retar-

geting pass breaks the blocky image structure caused by tile-based

optimization but increases the number of mispredictions.

The supplemental video shows animations with all methods,

where vertical ones are fed a new random estimate per frame. Even

without accumulating information over time, these consistently beat

the two horizontal methods. The latter suer from mispredictions

under fast motion and perform similarly to one another, though ours

remains superior in the presence of temporal smoothness. Mispredic-

tions could be eliminated by optimizing frames independently and

splitting the sampling budget into optimization and re-rendering

halves (as in Fig. 6), though at the cost of reduced sampling quality.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:11

Output images

RandomRandom

1 spp1 spp

RandomRandom

4-spp average4-spp average

Time: 0.08 secTime: 0.08se c

Vertical: HistogramVertical: Histogram

[Heitz and Belcour 2019][Heitz and Belcour 2019]

Time: 0.07 secTime: 0.07se c

Horizontal: PermutationHorizontal: Permutation

[Heitz and Belcour 2019][Heitz and Belcour 2019]

Time: 8.30 secTime: 8.30se c

Horizontal: IterativeHorizontal: Iterative

(ours)(ours)

Time: 0.04 secTime: 0.04se c

Vertical: DitheringVertical: Dithering

(ours)(ours)

Time: 0.04 secTime: 0.04se c

Vertical: Error diusionVertical: Error diusion

(ours)(ours)

Time: 15.2secTime: 15.2sec

Vertical: IterativeVertical: Iterative

(ours)(ours)

Output zoom-ins

HDR-VDP-2 error-detection maps

0%0% 100%100%

Fig. 6. Comparison of our algorithms against the permutation and histogram methods of Heitz and Belcour [2019] with equal total sampling cost of 4 spp.

Boom row shows HDR-VPD-2 error-detection maps (blue is beer, i.e., lower detection probability). The baseline 1-spp and 4-spp images exhibit large

perceptual error, while our vertical iterative minimization achieves highest fidelity. Error diusion produces similar quality. Dithering is as fast but shows

smaller improvement over the baselines, yet significantly outperforms the similar histogram method. Our horizontal iterative optimization does beer than

the permutation method. Our methods also reduce MSE compared to the 4-spp baseline, even though they do not focus solely on per-pixel error (see Table 1).

HistogramHistogram

1/16 spp1/16 spp

Iterative (ours)Iterative (ours)

1/4 spp1/4 spp

HistogramHistogram

1/64 spp1/64 spp

4-spp average4-spp average

HistogramHistogram

1/16 spp1/16 spp

Iterative (ours)Iterative (ours)

1/4 spp1/4 spp

HistogramHistogram

1/64 spp1/64 spp

4-spp average4-spp average

HistogramHistogram

1/16 spp1/16 spp

Iterative (ours)Iterative (ours)

1/4 spp1/4 spp

HistogramHistogram

1/64 spp1/64 spp

4-spp average4-spp average

Fig. 7. With a search space of only 4 spp, our vertical iterative minimization outperforms histogram sampling [Heitz and Belcour 2019] with 16

×

more input

samples. Please zoom in to fully appreciate the dierences; the full-size images are included in the supplemental material.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:12 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Frame 16

PermutationPermutation Iterative (ours)Iterative (ours) PermutationPermutation Iterative (ours)Iterative (ours) PermutationPermutation Iterative (ours)Iterative (ours)

Frame 1Frame 16

Prediction

Fig. 8. Comparison of our horizontal iterative minimization against the permutation method of Heitz and Belcour [2019] (with retargeting) on 16-frame

sequences of static scenes rendered at 4 spp. Our method does a beer job at improving the error distribution frame-to-frame.

Additional comparisons. Figure 9shows additional results from

our horizontal and vertical minimization and error diusion. We

compare these to the permutation method of Heitz and Belcour

[2019] which we found to perform better than their histogram ap-

proach on static scenes at equal sampling rates. For the horizontal

methods we show the results after 16 iterations. Our methods again

yield lower error, subjectively and numerically (see Tables 1and 2).

8 DISCUSSION

8.1 Bias towards surrogate

While ultimately we want to optimize w.r.t. the ground-truth image,

in practice we have to rely on a surrogate. In our experiments we

use reasonably high-quality surrogates, shown in Fig. 12, to best

demonstrate the capabilities of our framework. But when using

a surrogate of low quality, tting too closely to it can produce an

estimate with artifacts. In such cases less aggressive tting may yield

lower perceptual error. To explore the trade-o, in Appendix Bwe

augment the perceptual error with a term that penalizes deviations

from the initial estimate

𝑄

𝑄

𝑄init

(which case of vertical optimization

is obtained by averaging the input per-pixel estimates):

𝐸𝒞=(1−𝒞)𝑔

𝑔

𝑔2

1𝑄

𝑄

𝑄−𝑄

𝑄

𝑄init2

2+𝒞𝐸. (16)

The parameter

𝒞∈[

0

,

1

]

encodes our condence in the surrogate

quality. Setting

𝒞=

1reverts to the original formulation

(15)

, while

optimizing with

𝒞=

0yields the initial image estimate

𝑄

𝑄

𝑄init

. Opti-

mizing w.r.t. this energy can also be interpreted as projecting the

surrogate onto the space of Monte Carlo estimates in

Ω

Ω

Ω

, with control

over the tting power of the projection via 𝒞.

In Fig. 10, we plug the extended error formulation

(16)

into our

vertical iterative minimization. The results indicate that the visually

best result is achieved for dierent values of

𝒞

depending on the

surrogate quality. Specically, when optimizing w.r.t. the ground

truth, the tting should be most aggressive:

𝒞=

1. Conversely, if

the surrogate contains structural artifacts, the optimization should

be made less biased to it, e.g., by setting

𝒞=

0

.

5. Other ways to

control this bias are using a more restricted search space (e.g., non-

power-set) and capping the number of minimization iterations of

our methods. Note that the methods of Heitz and Belcour [2019]

rely on implicit surrogates and energies and thus provide no control

over this trade-o. We have found that their permutation method

generally avoids tiling artifacts induced by their piecewise constant

surrogate due to the retargeting step blurring the prediction image

(shown in Fig. 8zoom-ins); however, this blurring adds mispredic-

tions which deteriorate the nal image quality. Our methods provide

better ts, target the error explicitly, and are much superior when

the surrogate is good. With a bad surrogate, ours can be controlled

to never do worse than theirs.

8.2 Denoising

Our images are optimized for eliminating error and preserving

features when blurred with a given kernel. This blurring can be

seen as a simple form of denoising, and it is reasonable to expect

that the images are also better suited for general-purpose denoising

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:13

Horizontal: Iterative (ours)Horizontal: Iterative (ours) Vertical: Error di. (ours)Vertical: Error di. (ours)

Vertical: Iterative (ours)Vertical: Iterative (ours)Horizontal: PermutationHorizontal: Permutation

4-spp average4-spp average

Horizontal: Iterative (ours)Horizontal: Iterative (ours) Vertical: Error di. (ours)Vertical: Error di. (ours)

Vertical: Iterative (ours)Vertical: Iterative (ours)Horizontal: PermutationHorizontal: Permutation

4-spp average4-spp average

Horizontal: Iterative (ours)Horizontal: Iterative (ours) Vertical: Error di. (ours)Vertical: Error di. (ours)

Vertical: Iterative (ours)Vertical: Iterative (ours)Horizontal: PermutationHorizontal: Permutation

4-spp average4-spp average

Fig. 9. Comparison of our methods against the permutation approach of Heitz and Belcour [2019] at 4 spp; for the horizontal methods we show the result of

the 16

th

frame of static-scene rendering. Our two iterative minimization algorithms yield the best quality, while error diusion is fastest (see Tables 1and 2).

Surrogate: Ground truth Surr.: Denoised per-pixel avg. Surr.: Tile-wise sample avg.

Surrogate image

𝒞=1

𝒞=0.75𝒞=0.5

𝒞=0

Fig. 10. Balancing our iterative optimization between the surrogate (top

row) and the initial estimate (boom row) via the parameter

𝒞

from Eq. (16).

For high-quality surrogates (le and middle columns), the best result is

achieved for values of

𝒞

close to 1. In contrast, strong structural artifacts

(right column) call for lowering

𝒞

to avoid fiing too closely to the surrogate.

The (subjectively) best image in each column is outlined in red.

than traditional white-noise renderings are [Heitz and Belcour 2019;

Belcour and Heitz 2021]. However, we have found that obtaining

such benet is not straightforward.

In Fig. 11 we run Intel Open Image Denoise on the results from

our vertical iterative minimization. On the left scene, the input

samples

➀

have white-noise image distribution with large mag-

nitude; feeding their per-pixel averages to the denoiser, it cannot

reliably separate the signal from the noise and produces conspicu-

ous artifacts. Using this denoised image

➁

as a surrogate for our

optimization yields a “regularized” version

➂

of the input that is

easier for the denoiser to subsequently lter

➃

. This process can be

seen as projecting the initial denoised image back onto the space of

exact per-pixel estimates (while minimizing the pMSE) whose subse-

quent denoising avoids artifacts. Note that obtaining this improved

result requires no additional pixel sampling.

On the right scene in Fig. 11, the moderate input-noise level is

easy for the denoiser to clean while preserving the faint shadow

on the wall. Our optimization subsequently produces an excellent

result which yields a high-delity image when convolved with the

optimization kernel

𝑔

𝑔

𝑔

. Yet that same result is ruined by the denoiser

which eradicates the shadow, even though subjectively its signal-

to-noise ratio is higher than that of the input image. Overall, the

denoiser blurs our result

➂

aggressively on both scenes, eliminating

not only the high-frequency noise but also lower-frequency signal

not present in auxiliary input feature buers (depth, normals, etc).

It should not be too surprising that an image optimized for one

smoothing kernel does not always yield good results when ltered

with other kernels. As an example, Fig. 5shows clearly that the op-

timal noise distribution varies signicantly across dierent kernels.

While our kernel

𝑔

𝑔

𝑔

has narrow support and xed shape, denoising

kernels vary wildly over the image and are inferred from the input

in order to preserve features. Importantly, modern kernel-inference

models (like in the used denoiser) are designed (or trained) to expect

mutually uncorrelated pixel estimates [Intel 2018]. This white-noise-

error assumption can also yield wide smoothing kernels that are

unnecessarily aggressive for blue-noise distributions; we suspect

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:14 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

1

2

3

4

4-spp input4-spp input

Input denoisedInput denoised

OursOurs

Ours denoisedOurs denoised

Ours 𝑔

𝑔

𝑔-convolvedOurs 𝑔

𝑔

𝑔-convolved

1

2

3

4

4-spp input4-spp input

Input denoisedInput denoised

OursOurs

Ours denoisedOurs denoised

Ours 𝑔

𝑔

𝑔-convolvedOurs 𝑔

𝑔

𝑔-convolved

Fig. 11. By regularizing a noisy input, our optimization can help a denoiser

avoid producing artifacts (le scene), even though it targets a dierent

(perceptual) smoothing kernel

𝑔

𝑔

𝑔

. However, it can also cause elimination of

image features during denoising (right scene, the shadow).

this is what hinders the denoiser from detecting features present in

our optimized results whose pixels are highly correlated.

Our rm belief is that denoising could consistently benet from

error optimization, though that would require better coordination

between the two. One avenue for future work would be to tailor

the optimization to the kernels employed by a target denoiser. Con-

versely, denoising could be adapted to ingest correlated pixel esti-

mates with high-frequency error distribution; this would enable the

use of less aggressive smoothing kernels (see Fig. 3) and facilitate

feature preservation. As a more immediate treatment, image fea-

tures could be enhanced before or after our optimization to mitigate

the risk of them being eliminated by denoising.

8.3 Performance and utility

Throughout our experiments, we have found that the tested algo-

rithms rank in the following order in terms of increasing ability to

minimize perceptual error on static scenes at equal sampling cost:

histogram sampling, our dithering, permutation, our error diusion,

our horizontal iterative, our vertical iterative. The three lowest-

ranked methods employ some form of dithering which by design

assumes (a) constant image signal and (b) equi-spaced quantization

levels shared by all pixels. The latter assumption is severely broken

in the rendering setting where the “quantization levels” arise from

(random) pixel estimation. Our vertical methods (dithering, error

diusion, iterative) are more practical than the histogram sampling

of Heitz and Belcour [2019] as they can achieve high delity with a

much lower input-sample count. Horizontal algorithms are harder

to control due to their mispredictions which are further exacerbated

when reusing estimates across frames in dynamic scenes.

Our iterative minimizations can best adapt to the input and also

directly benet from the extensions in Section 6(unlike all others).

However, they are also the slowest, as evident in Table 2. Fortu-

nately, they can be sped up by several orders of magnitude through

additional optimizations from halftoning literature [Analoui and

Modern living roomModern living room Grey & white roomGrey & white room San MiguelSan Miguel

Wooden staircaseWooden staircase Japanese classroomJapanese classroom White roomWhite room

BathroomBathroom Modern hallModern hall

Fig. 12. Collage of the surrogates used in our experiments, obtained by

denoising the input estimates using Intel Open Image Denoise [Intel 2018].

Allebach 1992;Koge et al

.

2014]; we discuss these optimizations in

the context of our rendering setting in supplemental Section 3.

Error diusion is often on par with vertical iterative minimization

in quality and with dithering-based methods in run time. In a single-

threaded implementation it can outperform all others; parallel error-

diusion variants exist too [Metaxas 2003].

Practical utility. Our methods can enhance the perceptual delity

of static and dynamic renderings as demonstrated by our experi-

ments. For best results and maximum exibility, we suggest using

our vertical iterative optimization, optionally with the eciency

improvements mentioned above. Figure 10 illustrates that in practi-

cal scenarios (middle and right columns) this method can improve

upon both the surrogate (top row) and the input-estimate average

(bottom row) for a suitable value of the condence parameter

𝒞

.

For maximum eciency we recommend using our vertical error

diusion. To obtain a surrogate, we recommend regularizing the

input estimates via fast denoising or more basic bilateral or non-

local-means ltering. Our optimization can then be interpreted as

reducing bias or artifacts in such denoised images (see Fig. 10). Sim-

ple denoising of the result may yield better quality than traditional

aggressive denoising of the input samples.

Progressive rendering. Our optimization methods produce biased

pixel estimates through manipulating the input samples; this is true

even for a-priori methods where the sampling is completely de-

terministic. Nevertheless, consistency can be achieved through a

simple progressive-rendering scheme: For each pixel, newly gener-

ated samples are cumulatively averaged into a xed set of per-pixel

estimates that are periodically passed to the optimization to obtain

an updated image. Each individual estimate will converge to the

true pixel value, thus the optimized image will also approach the

ground truth—with bounded memory footprint. Interestingly, con-

vergence is guaranteed regardless of the optimization method and

surrogate used, though better methods and surrogates will yield bet-

ter starting points. Lastly, adaptive sampling is naturally supported

by vertical methods as they are agnostic of dierences in sample

counts between pixels.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:15

Table 1. MSE and pMSE (Section 7.1) metrics for various methods (ours in bold) and scenes. For horizontal methods we show the metrics for the 16

th

frame

of static-scene rendering. In each section we highlight the lowest error number per column. For the same number of samples per pixel (spp), our methods

consistently outperform those of Heitz and Belcour [2019]—the current state of the art, except our dithering can do worse than their permutation method.

Method Bathroom Classroom Gray Room Living Room Modern Hall San Miguel Staircase White Room

MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE MSE pMSE

×10−2×10−3×10−2×10−3×10−2×10−2×10−2×10−3×10−2×10−2×10−2×10−3×10−3×10−3×10−2×10−3

Random (4-spp average) 1.40 3.15 3.13 7.91 7.91 3.02 3.37 5.61 5.22 1.70 3.58 8.92 8.88 5.60 2.78 7.98

Vertical: Histogram [2019] (1

/4spp) 3.58 6.29 7.11 13.08 11.49 6.67 5.75 9.88 11.43 3.60 6.84 16.52 18.90 6.69 5.75 14.09

Vertical: Error diusion (1

/4spp) 1.22 2.27 4.91 7.03 8.76 2.82 2.08 2.31 4.86 1.33 5.07 8.50 6.87 5.08 2.19 5.16

Vertical: Dithering (1

/4spp) 1.31 3.31 4.36 11.63 8.46 5.07 2.27 4.43 5.25 1.80 3.74 11.19 7.80 5.36 2.51 7.95

Vertical: Iterative (1

/4spp) 2.32 2.02 6.00 6.10 9.07 2.97 4.32 1.86 7.15 1.29 5.51 7.05 10.50 4.45 3.98 5.00

Vertical: Iterative (power set, 1

/15 “spp”) 1.26 1.66 3.12 4.91 7.53 2.82 2.46 1.13 4.55 1.18 3.31 5.85 7.08 4.31 2.26 4.58

Horizontal: Permut. [2019] (frame 16, 4 spp)

1.40 2.79 3.15 7.25 7.90 2.84 3.38 3.14 5.21 1.51 3.59 8.51 8.87 5.40 2.72 6.73

Horizontal: Iterative (frame 16, 4 spp) 1.52 2.06 3.83 5.31 8.34 2.41 3.59 1.59 5.46 1.18 3.94 7.31 7.67 4.30 2.93 4.72

Random (16-spp average) 0.49 1.47 1.55 4.89 3.77 1.04 1.23 2.18 2.14 0.80 1.10 4.67 3.39 3.78 1.35 3.62

Vertical: Histogram [2019] (4

/16 spp) 1.40 2.37 3.12 6.20 7.88 2.72 3.36 3.57 5.23 1.48 3.52 6.82 7.13 4.09 2.77 5.77

Vertical: Error diusion (4

/16 spp) 0.41 1.20 0.94 3.85 4.00 0.87 0.86 1.07 1.68 0.66 1.33 4.70 2.76 3.69 0.73 2.13

Vertical: Dithering (4

/16 spp) 0.50 1.52 1.15 4.69 4.12 1.36 1.09 1.82 1.93 0.83 1.49 5.38 3.09 3.73 0.91 2.98

Vertical: Iterative (4

/16 spp) 0.90 1.10 2.03 3.35 5.17 0.84 2.30 0.84 3.03 0.64 2.39 4.02 4.46 3.14 1.75 1.99

Table 2. Optimization run times (in seconds) for various methods (ours in bold) and scenes using 4 input samples per pixel (spp), excluding sampling and

surrogate construction. For horizontal methods we report the average time over 16 frames. Our error diusion and dithering avoid sorting and are fastest;

though dithering-based, Heitz and Belcour’s approaches use sorting. Our iterative minimization methods are slowest (but can be sped up; see Section 8.3).

Method Bathroom Classroom Gray Room Living Room Modern Hall San Miguel Staircase White Room

Vertical: Histogram [2019] (1

/4spp) 0.06 0.07 0.11 0.06 0.02 0.09 0.08 0.06

Vertical: Error diusion (1

/4spp) 0.04 0.03 0.04 0.04 0.01 0.06 0.04 0.04

Vertical: Dithering (1

/4spp) 0.04 0.03 0.04 0.04 0.01 0.05 0.04 0.04

Vertical: Iterative (1

/4spp) 18.44 111.41 12.82 15.26 5.43 29.09 15.21 19.45

Vertical: Iterative (power set, 1

/15 “spp”) 95.09 404.12 59.69 83.41 23.93 137.89 35.39 102.05

Horizontal: Permutation [2019] (frame 16) 0.10 0.10 0.10 0.11 0.03 0.21 0.10 0.14

Horizontal: Iterative (frame 16) 23.04 21.57 22.00 30.08 8.48 36.36 23.78 22.76

9 CONCLUSION

We devise a formal treatment of image-space error distribution

in Monte Carlo rendering from both quantitative and perceptual

aspects. Our formulation bridges the gap between halftoning and

rendering by interpreting the error distribution problem as an ex-

tension of non-uniform multi-tone energy minimization halftoning.

To guide the distribution of rendering error, we employ a percep-

tual kernel-based model whose practical optimization can deliver

improvements not achievable by prior methods given the same

sampling data. Our model provides valuable insights as well as a

framework to further study the problem and its solutions.

A promising avenue for future research is to adapt even stronger

perceptual error models. Prior work has already demonstrated a

strong potential in reducing Monte Carlo noise visibility error using

visual masking [Bolin and Meyer 1998;Ramasubramanian et al

.

1999]. Robust metrics, other than squared

ℒ2

norm, can also be

considered with possible nonlinear relationships.

Our framework could conceivably be extended beyond the hu-

man visual system, i.e., for optimizing the inputs to other types

of image processing such as denoising. For such tasks, one could

consider lifting the assumption of a xed kernel to obtain an even

more general problem where the kernel and sample distribution are

optimized simultaneously (or alternatingly).

ACKNOWLEDGMENTS

Our results show scenes (summarized in Fig. 12) coming from third

parties. We acknowledge the PBRT scene repository for San Miguel

and Bathroom.Wooden staircase,Modern hall,Modern living room,

Japanese classroom,White room,Grey & white room, and Utah teapot

have been provided by Benedikt Bitterli. The rst author is funded

from the European Research Council (ERC) under the European

Union’s Horizon 2020 research and innovation program (grant agree-

ment №741215, ERC Advanced Grant INCOVID).

REFERENCES

Abdalla G. M. Ahmed and Peter Wonka. 2020. Screen-Space Blue-Noise Diusion of

Monte Carlo Sampling Error via Hierarchical Ordering of Pixels. ACM Trans. Graph.

(Proc. SIGGRAPH Asia) 39, 6, Article 244 (2020). https://doi.org/10.1145/3414685.

3417881

Jan P. Allebach and B. Liu. 1976. Random quasi-periodic halftone process. Journal of

the Optical Society of America 66, 9 (Sep 1976), 909–917. https://doi.org/10.1364/

JOSA.66.000909

Mostafa Analoui and Jan P. Allebach. 1992. Model-based halftoning using direct

binary search. In Human Vision, Visual Processing, and Digital Display III, Bernice E.

Rogowitz (Ed.), Vol. 1666. International Society for Optics and Photonics, SPIE, 96 –

108. https://doi.org/10.1117/12.135959

Dimitris Anastassiou. 1989. Error diusion coding for A/D conversion. IEEE Transactions

on Circuits and Systems 36, 9 (1989), 1175–1186. https://doi.org/10.1109/31.34663

Walter Arrighetti. 2017. The Academy Color Encoding System (ACES): A Professional

Color-Management Framework for Production, Post-Production and Archival of

Still and Motion Pictures. Journal of Imaging 3 (09 2017), 40. https://doi.org/10.

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:16 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

3390/jimaging3040040

Peter G.J. Barten. 1999. Contrast sensitivity of the human eye and its eects on image

quality. SPIE – The International Society for Optical Engineering. https://doi.org/

10.1117/3.353254

Barbara E. Bayer. 1973. An optimum method for two-level rendition of continuous-

tone pictures. In Proceedings of IEEE International Conference on Communications,

Conference Record, Vol. 26. 11–15.

Laurent Belcour and Eric Heitz. 2021. Lessons Learned and Improvements When Build-

ing Screen-Space Samplers with Blue-Noise Error Distribution. In ACM SIGGRAPH

2021 Talks (Virtual Event, USA) (SIGGRAPH ’21). Association for Computing Machin-

ery, New York, NY, USA, Article 9, 2 pages. https://doi.org/10.1145/3450623.3464645

Mark R. Bolin and Gary W. Meyer. 1995. A Frequency Based Ray Tracer. In Proceedings

of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (SIG-

GRAPH ’95). Association for Computing Machinery, New York, NY, USA, 409–418.

https://doi.org/10.1145/218380.218497

Mark R. Bolin and Gary W. Meyer. 1998. A Perceptually Based Adaptive Sampling

Algorithm. In Proceedings of the 25th Annual Conference on Computer Graphics and

Interactive Techniques (SIGGRAPH ’98). Association for Computing Machinery, New

York, NY, USA, 299–309. https://doi.org/10.1145/280814.280924

A. Celarek, W. Jakob, M. Wimmer, and J. Lehtinen. 2019. Quantifying the Error of

Light Transport Algorithms. Computer Graphics Forum 38, 4 (2019), 111–121. https:

//doi.org/10.1111/cgf.13775

Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi,

Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Recon-

struction of Monte Carlo Image Sequences Using a Recurrent Denoising Au-

toencoder. ACM Trans. Graph. 36, 4, Article 98 (jul 2017), 12 pages. https:

//doi.org/10.1145/3072959.3073601

Jianghao Chang, Benoît Alain, and Victor Ostromoukhov. 2009. Structure-Aware Error

Diusion. ACM Trans. Graph. 28, 5 (dec 2009), 1–8. https://doi.org/10.1145/1618452.

1618508

Scott J. Daly. 1987. Subroutine for the Generation of a Two Dimensional Human Visual

Contrast Sensitivity Function. Technical Report 233203Y. Eastman Kodak: Rochester,

NY, USA.

Scott J. Daly. 1992. Visible dierences predictor: an algorithm for the assessment of

image delity. In Human Vision, Visual Processing, and Digital Display III, Bernice E.

Rogowitz (Ed.), Vol. 1666. International Society for Optics and Photonics, SPIE, 2 –

15. https://doi.org/10.1117/12.135952

Robin J. Deeley, Neville Drasdo, and W. Neil Charman. 1991. A simple parametric mo del

of the human ocular modulation transfer function. Ophthalmic and Physiological

Optics 11, 1 (1991), 91–93. https://doi.org/10.1111/j.1475-1313.1991.tb00200.x

James A. Ferwerda, Sumanta N. Pattanaik, Peter Shirley, and Donald P. Greenberg. 1996.

A Model of Visual Adaptation for Realistic Image Synthesis. In Proceedings of the

23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH

’96). Association for Computing Machinery, New York, NY, USA, 249–258. https:

//doi.org/10.1145/237170.237262

Robert W.F loydand Louis Steinb erg. 1976. An Adaptive Algorithm for Spatial Greyscale.

Proceedings of the Society for Information Display 17, 2 (1976), 75–77.

Iliyan Georgiev and Marcos Fajardo. 2016. Blue-Noise Dithered Sampling. In ACM

SIGGRAPH 2016 Talks (Anaheim, California) (SIGGRAPH ’16). Association for Com-

puting Machinery, New York, NY, USA, Article 35, 1 pages. https://doi.org/10.1145/

2897839.2927430

A. J. González, J. Bacca, G. R. Arce, and D. L. Lau. 2006. Alpha stable human visual system

models for digital halftoning. In Human Vision and Electronic Imaging XI, Bernice E.

Rogowitz, Thrasyvoulos N. Pappas, and Scott J. Daly (Eds.), Vol. 6057. International

Society for Optics and Photonics, SPIE, 180 – 191. https://doi.org/10.1117/12.643540

Eric Heitz and Laurent Belcour. 2019. Distributing Monte Carlo Errors as a Blue Noise

in Screen Space by Permuting Pixel Seeds Between Frames. Computer Graphics

Forum 38, 4 (2019), 149–158. https://doi.org/10.1111/cgf.13778

Eric Heitz, Laurent Belcour, V. Ostromoukhov, David Coeurjolly, and Jean-Claude Iehl.

2019. A Low-Discrepancy Sampler That Distributes Monte Carlo Errors as a Blue

Noise in Screen Space. In ACM SIGGRAPH 2019 Talks (Los Angeles, California)

(SIGGRAPH ’19). Association for Computing Machinery, New York, NY, USA, Article

68, 2 pages. https://doi.org/10.1145/3306307.3328191

Edwin Hewitt and Kenneth A. Ross. 1994. Abstract Harmonic Analysis: VolumeI Structure

of Topological Groups Integration Theory Group Representations. Springer New York.

Sam Hocevar and Gary Niger. 2008. Reinstating Floyd-Steinberg: Improved Metrics

for Quality Assessment of Error Diusion Algorithms, Vol. 5099. 38–45. https:

//doi.org/10.1007/978-3- 540-69905- 7_5

Intel. 2018. Intel Open Image Denoise. https://www.openimagedenoise.org.

James T. Kajiya. 1986. The Rendering Equation. SIGGRAPH Comput. Graph. 20, 4 (aug

1986), 143–150. https://doi.org/10.1145/15886.15902

Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkühler, Mikhail Okunev, Todd

Goodall, and Gizem Rufo. 2019. DeepFovea: Neural Reconstruction for Foveated

Rendering and Video Compression Using Learned Statistics of Natural Videos. ACM

Trans. Graph. 38, 6, Article 212 (nov 2019), 13 pages. https://doi.org/10.1145/3355089.

3356557

Hiroaki Koge, Yasuaki Ito, and Koji Nakano. 2014. A GP U Implementation of Clipping-

Free Halftoning Using the Direct Binary Search. In Algorithms and Architectures

for Parallel Processing, Xian-he Sun, Wenyu Qu, Ivan Stojmenovic, Wanlei Zhou,

Zhiyang Li, Hua Guo, Geyong Min, Tingting Yang, Yulei Wu, and Lei Liu (Eds.).

Springer International Publishing, Cham, 57–70. https://doi.org/10.1007/978-3- 319-

11197-1_5

Lauwerens Kuipers and Harald Niederreiter. 1974. Uniform Distribution of Sequences.

Wiley, New York, USA.

Alexandr Kuznetsov, Nima Khademi Kalantari, and Ravi Ramamoorthi. 2018. Deep

Adaptive Sampling for Low Sample Count Rendering. Computer Graphics Forum 37,

4 (2018), 35–44. https://doi.org/10.1111/cgf.13473

Daniel L. Lau and Gonzalo R. Arce. 2007. Modern Digital Halftoning, Second Edition.

CRC Press, Inc., USA.

Tony Lindeberg. 1990. Scale-space for discrete signals. IEEE Transactions on Pattern

Analysis and Machine Intelligence 12, 3 (1990), 234–254. https://doi.org/10.1109/34.

49051

Jerey Lubin. 1995. A Visual Discrimination Model for Imaging System Design and Eval-

uation. In Vision Models for Target Detection and Recognition, Eli Peli (Ed.). World Sci-

entic Publishing Company, Inc., 245–283. https://doi.org/10.1142/9789812831200_

0010

James L. Mannos and David J. Sakrison. 1974. The eects of a visual delity criterion

of the encoding of images. IEEE Transactions on Information Theory 20, 4 (1974),

525–536. https://doi.org/10.1109/TIT.1974.1055250

Rafał Mantiuk, Scott J. Daly, Karol Myszkowski, and Hans-Peter Seidel. 2005. Predicting

visible dierences in high dynamic range images: model and its calibration. In

Human Vision and Electronic Imaging X, Bernice E. Rogowitz, Thrasyvoulos N.

Pappas, and Scott J. Daly (Eds.), Vol. 5666. International Society for Optics and

Photonics, SPIE, 204 – 214. https://doi.org/10.1117/12.586757

Rafał Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-

VDP-2: A Calibrated Visual Metric for Visibility and Quality Predictions in All

Luminance Conditions. ACM Trans. Graph. 30, 4, Article 40 (jul 2011), 14 pages.

https://doi.org/10.1145/2010324.1964935

Panagiotis Takis Metaxas. 2003. Parallel Digital Halftoning by Error-Diusion. In

Proceedings of the Paris C. Kanellakis Memorial Workshop on Principles of Computing

& Knowledge. 35–41. https://doi.org/10.1145/778348.778355

Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing. In

Proc. ACM SIGGRAPH, Vol. 25. 157–164. https://doi.org/10.1145/127719.122736

Theophano Mitsa and Kevin J. Parker. 1991. Digital halftoning using a blue noise mask.

In Proceedings of International Conference on Acoustics, Speech, and Signal Processing.

2809–2812 vol.4. https://doi.org/10.1109/ICASSP.1991.150986

Kathy T. Mullen. 1985. The contrast sensitivity of human colour vision to red-green

and blue-yellow chromatic gratings. Journal of Physiology 359 (1985), 381–400.

https://doi.org/10.1113/jphysiol.1985.sp015591

Karol Myszkowski. 1998. The Visible Dierences Predictor: applications to global

illumination problems. In Rendering Techniques ’98, George Drettakis and Nelson

Max (Eds.). Springer Vienna, Vienna, 223–236. https://doi.org/10.1007/978- 3-7091-

6453-2_21

Risto Näsänen. 1984. Visibility of halftone dot textures. IEEE Transactions on Systems,

Man, and Cybernetics SMC-14, 6 (1984), 920–924. https://doi.org/10.1109/TSMC.

1984.6313320

Harald Niederreiter. 1992. Random Number Generation and quasi-Monte Carlo Methods.

Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.

Victor Ostromoukhov. 2001. A Simple and Ecient Error-Diusion Algorithm (SIG-

GRAPH ’01). Association for Computing Machinery, New York, NY, USA, 567–572.

https://doi.org/10.1145/383259.383326

Wai-Man Pang, Yingge Qu, Tien-Tsin Wong, Daniel Cohen-Or, and Pheng-Ann Heng.

2008. Structure-Aware Halftoning. ACM Trans. Graph. 27, 3 (aug 2008), 1–8. https:

//doi.org/10.1145/1360612.1360688

Thrasyvoulos N. Pappas and David L. Neuho. 1999. Least-squares model-based

halftoning. IEEE Transactions on Image Processing 8, 8 (Aug 1999), 1102–1116.

https://doi.org/10.1109/83.777090

Eli Peli, Jian Yang, and Robert B. Goldstein. 1991. Image invariance with changes in size:

the role of peripheral contrast thresholds. Journal of the Optical Society of America

8, 11 (Nov 1991), 1762–1774. https://doi.org/10.1364/JOSAA.8.001762

Matt Pharr, Wenzel Jakob, and Greg Humphreys. 2016. Physically Based Rendering:

From Theory To Implementation (3rd ed.). Morgan Kaufmann Publishers Inc.

Mahesh Ramasubramanian, Sumanta N. Pattanaik, and Donald P. Greenberg. 1999.

A Perceptually Based Physical Error Metric for Realistic Image Synthesis. In Pro-

ceedings of the 26th Annual Conference on Computer Graphics and Interactive Tech-

niques (SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 73–82.

https://doi.org/10.1145/311535.311543

Ansari Rashid, Guillemot Christine, and Memon Nasir. 2005. 5.5 - Lossy Image Com-

pression: JPEG and JPEG2000 Standards. In Handbook of Image and Video Processing

(Second Edition) (second edition ed.), AL BOVIK (Ed.). Academic Press, Burlington,

709–XXII. https://doi.org/10.1016/B978-012119792- 6/50105-4

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:17

J.G. Robson and Norma Graham. 1981. Probability summation and regional variation

in contrast sensitivity across the visual eld. Vision Research 21, 3 (1981), 409–418.

https://doi.org/10.1016/0042-6989(81)90169- 3

Gurprit Singh, Cengiz Öztireli, Abdalla G.M. Ahmed, David Coeurjolly, Kartic Subr,

Oliver Deussen, Victor Ostromoukhov, Ravi Ramamoorthi, and Wojciech Jarosz.

2019. Analysis of Sample Correlations for Monte Carlo Rendering. Computer

Graphics Forum 38, 2 (2019), 473–491. https://doi.org/10.1111/cgf.13653

Givago da Silva Souza, Bruno Duarte Gomes, and Luiz Carlos L. Silveira. 2011. Com-

parative neurophysiology of spatial luminance contrast sensitivity. Psychology &

Neuroscience 4 (06 2011), 29–48. https://doi.org/10.3922/j.psns.2011.1.005

Kevin E. Spaulding, Rodney L. Miller, and Jay S. Schildkraut. 1997. Methods for gener-

ating blue-noise dither matrices for digital halftoning. Journal of Electronic Imaging

6, 2 (1997), 208–230. https://doi.org/10.1117/12.266861

James R. Sullivan, Lawrence A. Ray, and Rodney Miller. 1991. Design of minimum visual

modulation halftone patterns. IEEE Transactions on Systems, Man, and Cybernetics

21, 1 (Jan 1991), 33–38. https://doi.org/10.1109/21.101134

Robert A. Ulichney. 1987. Digital Halftoning. Cambridge, Massachusetts.

Robert A. Ulichney. 1993. Void-and-cluster method for dither array generation. In

Human Vision, Visual Processing, and Digital Display IV, Jan P. Allebach and Bernice E.

Rogowitz (Eds.), Vol. 1913. International Society for Optics and Photonics, SPIE, 332

– 343. https://doi.org/10.1117/12.152707

Valdimir Volevich, Karol Myszkowski, Andrei Khodulev, and Edward A. Kopylov. 2000.

Using the Visual Dierences Predictor to Improve Performance of Progressive

Global Illumination Computation. ACM Trans. Graph. 19, 2 (2000), 122–161. https:

//doi.org/10.1145/343593.343611

G. Westheimer. 1986. The eye as an optical instrument. In Handbook of Perception and

Human Performance: 1. Sensory Processes and Perception, K.R. Bo, L. Kaufman, and

J.P. Thomas (Eds.). Wiley, New York, 4.1–4.20.

Sophie Wuerger, Maliha Ashraf, Minjung Kim, Jasna Martinovic, María Pérez-Ortiz,

and Rafał K. Mantiuk. 2020. Spatio-chromatic contrast sensitivity under mesopic

and photopic light levels. Journal of Vision 20, 4 (04 2020), 23–23. https://doi.org/

10.1167/jov.20.4.23

M. Zwicker, W. Jarosz, J. Lehtinen, B. Moon, R. Ramamoorthi, F. Rousselle, P. Sen, C.

Soler, and S.-E. Yoon. 2015. Recent Advances in Adaptive Sampling and Reconstruc-

tion for Monte Carlo Rendering. Computer Graphics Forum 34, 2 (2015), 667–681.

https://doi.org/10.1111/cgf.12592

A ERROR DECOMPOSITION FOR HORIZONTAL

OPTIMIZATION

Substituting

𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆))=𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))+Δ

Δ

Δ

into Eq. (9), we bound the

perceptual error using Jensen’s inequality and the discrete Young

convolution inequality [Hewitt and Ross 1994]:

𝐸(𝜋(𝑆

𝑆

𝑆))=𝑔

𝑔

𝑔∗(𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))−𝐼

𝐼

𝐼+Δ

Δ

Δ)2

2(17a)

=40.5𝑔

𝑔

𝑔∗(𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))−𝐼

𝐼

𝐼)+0.5𝑔

𝑔

𝑔∗Δ

Δ

Δ)2

2(17b)

≤2𝑔

𝑔

𝑔∗(𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))−𝐼

𝐼

𝐼)2

2+2𝑔

𝑔

𝑔2

1Δ

Δ

Δ2

2.(17c)

The rst term in Eq. (17c) involves pixel permutations in the readily

available estimated image

𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

. In the second term we make an ap-

proximation that avoids rendering invocations:

Δ

Δ

Δ2

2≈∑𝑖𝑑(𝑖, 𝜋(𝑖))

,

where

𝑑(𝑖, 𝑗)

measures the dissimilarity between the light-transport

integrals of pixels

𝑖

and

𝑗

. Dropping the constant 2, we take the

resulting bound as the optimization energy 𝐸𝑑in Eq. (10b).

B SURROGATE CONFIDENCE CONTROL

Here we extend our perceptual error formulation to account for

deviations of the surrogate image

𝐼′

𝐼′

𝐼′

from the ground truth

𝐼

𝐼

𝐼

. We

introduce a parameter

𝒞∈[

0

,

1

]

that encodes our condence in the

quality of the surrogate and instructs the optimization how closely to

t to it. Given an initial image estimate

𝑄

𝑄

𝑄init

(the per-pixel estimate

average for vertical optimization), we look to optimize for

𝑄

𝑄

𝑄

. We

begin with an articial expansion:

√𝐸=𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼2(18a)

=(1−𝒞)(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄init)+𝒞(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′)+

(1−𝒞)(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄init −ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼)+𝒞(ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼)2.(18b)

Using the triangle inequality we then obtain the following bound:

√𝐸≤(1−𝒞)(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄init)+𝒞(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′)2+

(1−𝒞)(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄init −ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼)+𝒞(ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼)2.(19)

The second term on the right-hand side can be dropped as it is

independent of the optimization variable

𝑄

𝑄

𝑄

. We bound the square of

the rst term using Jensen’s and Young’s convolution inequalities:

(1−𝒞)(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄init)+𝒞(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′)2

2≤(20a)

(1−𝒞)𝑔

𝑔

𝑔2

1𝑄

𝑄

𝑄−𝑄

𝑄

𝑄init2

2+𝒞𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′2

2.(20b)

We take this bound to be our optimization energy in Eq. (16), noting

that the squared norm in the second term is the original energy

with the surrogate 𝐼′

𝐼′

𝐼′substituted for the ground truth 𝐼

𝐼

𝐼.

If a condence map

𝒞

𝒞

𝒞

is available (e.g., as a byproduct of denois-

ing), the minimization can be done with per-pixel control:

𝐸𝒞

𝒞

𝒞=𝑔

𝑔

𝑔2

1√1

1

1−𝒞

𝒞

𝒞⊙(𝑄

𝑄

𝑄−𝑄

𝑄

𝑄init)2

2+√𝒞

𝒞

𝒞⊙(𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄−ℎ

ℎ

ℎ∗𝐼′

𝐼′

𝐼′)2

2.(21)

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Supplementary material for:

Perceptual error optimization for Monte Carlo rendering

VASSILLEN CHIZHOV,MIA Group, Saarland University, Max-Planck-Institut für Informatik, Germany

ILIYAN GEORGIEV,Autodesk, United Kingdom

KAROL MYSZKOWSKI,Max-Planck-Institut für Informatik, Germany

GURPRIT SINGH,Max-Planck-Institut für Informatik, Germany

In this supplemental document we discuss various details related to our

general formulation from the main paper. We start with a description of the

extension of our framework to the a-priori setting (Section 1). In Section 2

we describe a way in which textures can be accounted for in our horizontal

approach, so that mispredictions due to multiplicative (and additive) factors

are eliminated. In Section 3we describe ways in which the runtime of itera-

tive energy minimization methods can be improved considerably. Notably,

an expression is derived allowing the energy dierence due to trial swaps

to be evaluated in constant time (no scaling with image size or kernel size).

In the remaining sections we analyze how current a-posteriori [Heitz and

Belcour 2019] (Section 5) and a-priori [Georgiev and Fajardo 2016;Heitz et al

.

2019] (Section 6) state of the art approaches can be related to our framework.

Interpretations are discussed, major sources of error are identied, and the

assumptions of the algorithms are made explicit.

1 A-PRIORI OPTIMIZATION

We extend our theory to the a-priori setting and discuss the main

factors aecting the quality. The quality of a-priori approaches is

determined mainly by three factors: the energy, the search space,

and the optimization strategy. We discuss each of those briey in

the following paragraphs.

Our energy. We extend the a-posteriori energy from the main

paper in order to handle multiple estimators involving dierent

integrands: 𝑄

𝑄

𝑄1, . . . , 𝑄

𝑄

𝑄𝑇, with associated weights 𝑤1, . . . , 𝑤𝑇:

𝐸(𝑆

𝑆

𝑆)=𝑇

𝑡=1

𝑤𝑡𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄𝑡(𝑆

𝑆

𝑆)−𝐼

𝐼

𝐼𝑡2.(1)

In the above

𝑔

𝑔

𝑔

would typically be a low-pass kernel (e.g., Gauss-

ian), and

𝐼

𝐼

𝐼𝑡

is the integral of the function used in the estimator

𝑄

𝑄

𝑄𝑡

.

Through this energy a whole set of functions can be optimized for,

in order for the sequence to be more robust to dierent scenes and

estimators, that do not t any of the considered integrands exactly.

We note that the derived optimization in Section 3below is also

applicable to the minimization of the proposed energy.

Search space. The search space plays an important role for the

qualities which the optimized sequences exhibit. A more restricted

search space provides more robustness and may help avoid over-

tting to the considered set of integrands.

For instance, sample sets may be generated randomly within each

pixel. Then, their assignment to pixels may be optimized over the

space of all possible permutations. This is the setting of horizontal

©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.

This is the author’s version of the work. It is posted here for your personal use. Not for

redistribution. The denitive Version of Record was published in ACM Transactions on

Graphics,https://doi.org/10.1145/3504002.

methods. If additionally this assignment is done within each dimen-

sion separately it allows for an even better t to the integrands in

the energy (but may degrade the general integration properties of

the sequence). The scrambling keys’ search space in [Heitz et al

.

2019] is a special case of the latter applied for the Sobol sequence.

Constraining the search space to points generated from low-

discrepancy sequences provides further robustness and guarantees

desirable integration properties of the considered sequences. Simi-

larly to [Heitz et al

.

2019], we can consider a search space of Sobol

scrambling keys in order for the optimized sequence to have a low

discrepancy.

Ideally, such integration properties should arise directly from

the energy. However, in practice the scene integrand cannot be

expected to exactly match the set of considered integrands, thus

extra robustness is gained through the restriction. Additionally,

optimizing for many dimensions at the same time is costly as noted

in [Heitz et al

.

2019], thus imposing low-discrepancy properties also

helps in that regard.

Finally, by imposing strict search space constraints a severe re-

striction on the distribution of the error is imposed. This can be

alleviated by imposing the restrictions through soft penalty terms

in the energy. This can allow for a trade-o between blue noise

distribution and integration quality for example.

Progressive rendering. In order to make the sequence applicable to

progressive rendering, subsets of samples should be considered in

the optimization. Given a sample set

𝑆𝑖

for pixel

𝑖

we can decompose

it in sample sets of 1

, . . . , 𝑁

samples:

𝑆𝑖,1⊂... ⊂𝑆𝑖,𝑁 ≡𝑆𝑖

. We denote

the respective images of sample sets 𝑆

𝑆

𝑆1, . . . , 𝑆

𝑆

𝑆𝑁.

Then an energy that also optimizes for the distribution of the

error at each sample count is:

𝐸(𝑆

𝑆

𝑆)=𝑇

𝑡=1

𝑁

𝑘=1

𝑤𝑡,𝑘 𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄𝑡(𝑆

𝑆

𝑆𝑘)−𝐼

𝐼

𝐼𝑡2.(2)

If

𝑤𝑖,𝑘

are set to zero for

𝑘<𝑁

then the original formulation is

recovered. The more general formulation imposes additional con-

straints on the samples, thus the quality at the full sample count

may be compromised if we also require a good quality at lower

sample counts.

Choosing samples from

𝑆𝑖

for

𝑆𝑖,1, . . . , 𝑆𝑖 ,𝑁 −1

(in each dimension)

constitutes a vertical search space analogous to the one discussed

in the main paper for a-posteriori methods. The ranking keys’ opti-

mization in [Heitz et al

.

2019] is a special case of this search space

for the Sobol sequence.

arXiv:2012.02344v6 [cs.GR] 5 Apr 2022

26:2 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Adaptive sampling can be handled by allowing a varying number

of samples per pixel, and a corresponding energy derived from the

one above. Note that this poses further restrictions on the achievable

distribution.

Optimization strategies. Typically the energies for a-priori meth-

ods have been optimized through simulated annealing [Georgiev

and Fajardo 2016;Heitz et al

.

2019]. Metaheuristics can lead to very

good minima especially if the runtime is not of great concern, which

is the case since the sequences are precomputed. Nevertheless, the

computation still needs to be tractable. The energies in previous

works are generally not cheap to evaluate. On the other hand, our

energies, especially if the optimizations in Section 3are considered,

can be evaluated very eciently. This is benecial for keeping the

runtime of metaheuristics manageable, allowing for more complex

search spaces to be considered.

Implementation details. Implementation decisions for a renderer,

such as how samples are consumed, or how those are mapped to

the hemisphere and light sources, aect the estimator

𝑄

𝑄

𝑄

. This is

important, especially when choosing

𝑄

𝑄

𝑄

for the described energies to

optimize a sequence. It is possible that very small implementation

changes make a previously ideal sequence useless for a specic

renderer. It is important to keep this in mind when optimizing

sequences by using the proposed energies and when those are used

in a renderer.

2 TEXTURE DEMODULATION FOR HORIZONTAL

OPTIMIZATION

Our iterative energy minimization algorithms (Alg. 1, Alg. 2, main

paper) directly work with the original energy formulation, unlike

error diusion and dither matrix halftoning which only approxi-

mately minimize the energy. This allows textures to be handled

more robustly compared to the permutation approach of Heitz and

Belcour.

Reducing misprediction errors. Our horizontal approach relies on

a dissimilarity metric

𝑑(⋅,⋅)

which approximates terms involving

the dierence

Δ

Δ

Δ

due to swapping sample sets instead of pixels. This

dierence can be decreased, so that

𝑑

is a better approximation, if

additional information is factored out in the energy: screen-space

varying multiplicative and additive terms. Specically, if we have

a spatially varying multiplicative image

𝛼

𝛼

𝛼

, and a spatially varying

additive image 𝛽

𝛽

𝛽:

𝑄

𝑄

𝑄=𝛼

𝛼

𝛼𝑄

𝑄

𝑄′+𝛽

𝛽

𝛽(3)

Δ

Δ

Δ′(𝜋)=𝛼

𝛼

𝛼⊙𝑄

𝑄

𝑄′(𝜋(𝑆

𝑆

𝑆))−𝛼

𝛼

𝛼⊙𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆)) (4)

Δ

Δ

Δ(𝜋)=𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆))−𝜋(𝑄

𝑄

𝑄(𝑆

𝑆

𝑆))=

𝛼

𝛼

𝛼⊙𝑄

𝑄

𝑄′(𝜋(𝑆

𝑆

𝑆))+𝛽

𝛽

𝛽−𝜋(𝛼

𝛼

𝛼⊙𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆)+𝛽

𝛽

𝛽),(5)

we can make use of this in our formulation:

𝐸(𝜋)=𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄(𝜋(𝑆

𝑆

𝑆))−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼2

2(6)

𝐸(𝜋)≤𝑔

𝑔

𝑔∗(𝛼

𝛼

𝛼⊙𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))+𝛽

𝛽

𝛽)−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼2+𝑔

𝑔

𝑔1Δ

Δ

Δ′2.(7)

Contrast this to the original formulation where

𝛼

𝛼

𝛼

and

𝛽

𝛽

𝛽

are not

factored out:

𝐸(𝜋)≤𝑔

𝑔

𝑔∗𝜋𝛼

𝛼

𝛼⊙𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆)+𝛽

𝛽

𝛽−ℎ

ℎ

ℎ∗𝐼

𝐼

𝐼2+𝑔

𝑔

𝑔1Δ

Δ

Δ2.(8)

With the new formulation it is sucient that

𝑄

𝑄

𝑄′(𝜋(𝑆

𝑆

𝑆))=𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))

for

Δ

Δ

Δ′

to be zero, while originally both

𝛼

𝛼

𝛼

and

𝛽

𝛽

𝛽

play a role in

Δ

Δ

Δ

becoming zero. Intuitively this means that screen space integrand

dierences due to additive and multiplicative factors do not result

in mispredictions with the new formulation, if the integrand is

assumed to be the same (locally) in screen space.

Comparison to demodulation. In the method of Heitz and Belcour

the permutation is applied on the albedo demodulated image. This

preserves the property that the global minimum of the implicit

energy can be found through sorting. Translated to our framework

this can be formulated as (

𝐵

𝐵

𝐵

is a blue noise mask optimized for a

kernel 𝑔

𝑔

𝑔):

𝐸𝐻 𝐵𝑃 (𝜋)=𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))−𝐼′

𝐼′

𝐼′−𝐵

𝐵

𝐵2

2≈𝑔

𝑔

𝑔∗𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))−𝑔

𝑔

𝑔∗𝐼′

𝐼′

𝐼′2

2.(9)

We have assumed that

𝛽

𝛽

𝛽

is zero, but we can also extend the method

to handle an additive term

𝛽

𝛽

𝛽

as in our case. The more important

distinction is that while the albedo demodulated image

𝑄

𝑄

𝑄′

is used

in the permutation, it is never re-modulated (

𝛼

𝛼

𝛼⊙⋅

is missing). Thus,

this does not allow for proper handling of textures, even if it allows

for modest improvements in practice. An example of a fail case

consists of an image

𝛼

𝛼

𝛼

that is close to white noise. Then the error

distribution will also be close to white noise due to the missing

𝛼

𝛼

𝛼⊙⋅

factor. More precisely, even if

𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))−𝐼′

𝐼′

𝐼′

is distributed as

𝐵

𝐵

𝐵

, this

does not imply that

𝛼

𝛼

𝛼⊙𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))−𝐼′

𝐼′

𝐼′

will be distributed similarly.

Dropping

𝛼

𝛼

𝛼⊙⋅

is, however, a reasonable option if one is restricted

to sorting as an optimization strategy.

We propose a modication of the original approach (and energy)

such that not only the demodulated estimator values are used, but

the blue noise mask

𝐵

𝐵

𝐵

is also demodulated. To better understand

how it is derived (and how

𝛽

𝛽

𝛽

may be integrated) we study a bound

based on the assumption that 𝛼𝑖∈[0,1], and Δ

Δ

Δ′=0

𝐸(𝜋)=𝑔

𝑔

𝑔∗(𝛼

𝛼

𝛼⊙𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))+𝛽

𝛽

𝛽)−𝑔

𝑔

𝑔∗𝐼′

𝐼′

𝐼′2

2≈(10)

𝛼

𝛼

𝛼⊙𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))+𝛽

𝛽

𝛽−𝐼′

𝐼′

𝐼′−𝐵

𝐵

𝐵2

2=(11)

𝑖

𝛼2

𝑖(𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆)))𝑖+𝛽𝑖−𝐼′

𝑖−𝐵𝑖

𝛼𝑖2≤(12)

𝜋(𝑄

𝑄

𝑄′(𝑆

𝑆

𝑆))+𝛽

𝛽

𝛽−𝐼′

𝐼′

𝐼′−𝐵

𝐵

𝐵

𝛼

𝛼

𝛼2

2

.(13)

The global minimum of the last energy (w.r.t.

𝜋

) can be found

through sorting also, since there is no spatially varying multiplica-

tive factor 𝛼

𝛼

𝛼in front of the permutation.

Sinusoidal textures. To demonstrate texture handling (multiplica-

tive term

𝛼

𝛼

𝛼

), in the top row of Fig. 1, a white-noise texture

𝑊

is multi-

plied with a sine-wave input signal:

𝑓(𝑥, 𝑦)=

0

.

5

∗(1.0+sin(𝑥+𝑦))∗

𝑊(𝑥, 𝑦)

. The reference is a constant image at 0

.

5.Heitz and Belcour

proposed to handle such textures by applying their method on the

albedo-demodulated image. While this strategy may lead to a modest

improvement, it ignores the fact that the image is produced by re-

modulating the albedo, which can negate that improvement. Instead,

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

Perceptual error optimization for Monte Carlo rendering •26:3

Input Ours Heitz and Belcour [2019]

multiplicative

additive

No Demodulation Demodulation

demodulation w/ tilesize 8 w/o tiling

Fig. 1. We demonstrate the importance of the extension presented in Sec-

tion 2. A high-frequency sinusoidal texture is corrupted by white noise

(lemost column) multiplicatively (

top row

) and additively (

boom row

).

Contrary to Heitz and Belcour’s method, our optimization distributes error

as a high-quality blue-noise distribution (see the power-spectrum insets).

The reference images for the top/boom image are respectively a flat grey

and a sinusoidal image.

our horizontal iterative minimization algorithm can incorporate the

albedo explicitly using the discussed energy.

The bottom row demonstrates the eect of a non-at signal on

the error distribution (additive term

𝛽

𝛽

𝛽

). Here

𝑊

is added to a sine-

wave input signal:

𝑓(𝑥, 𝑦)=

0

.

5

∗(1.0+sin(𝑥+𝑦))+𝑊(𝑥, 𝑦)

. The

reference image is 0

.

5

∗(1+sin(𝑥+𝑦))

. Our optimization is closer

to the reference suggesting that our method can greatly outperform

the current state of the art by properly accounting for auxiliary

information, especially in regions with high-frequency textures.

Dimensional decomposition. The additive factor

𝛽

𝛽

𝛽

can be used to

motivate splitting the optimization over several dimensions, since

the Liouville–Neumann expansion of the rendering equation is ad-

ditive [Kajiya 1986]. If some dimensions are smooth (e.g., lower

dimensions), then a screen space local integrand similarity assump-

tion can be encoded in

𝑑(⋅,⋅)

and it will approximate

Δ

Δ

Δ

better for

smoother dimensions. If the optimization is applied over all dimen-

sions at the same time, this may result in many mispredictions due

to the assumption being violated for dimensions in which the in-

tegrand is less smooth in screen space (e.g., higher dimensions).

We propose splitting the optimization problem starting from lower

dimensions and sequentially optimizing higher dimensions while

encoding a local smoothness (in screen space) assumption on the in-

tegrand in

𝑑(⋅,⋅)

(e.g., swaps limited to a small neighborhood around

the pixel). This requires solving several optimization problems, but

potentially reduces the amount of mispredictions. Note that it does

not require more rendering operations than usual.

3 IMPROVING ITERATIVE-OPTIMIZATION

PERFORMANCE

The main cost of iterative minimization methods is computing the

energy for each trial swap, more specically the required convolu-

tion and the subsequent norm computation. In the work of Analoui

and Allebach an optimization has been proposed to eciently eval-

uate such trial swaps, without recomputing a convolution or norm

at each step, yielding a speed up of more than 10 times. The opti-

mization relies on the assumption that the kernel

𝑔

𝑔

𝑔

is the same in

screen space (the above optimization is not applicable for spatially

varying kernels). We extend the described optimization to a more

general case, also including spatially varying kernels. We also note

some details not mentioned in the original paper.

3.1 Horizontal swaps

We will assume the most general case: instead of just swapping

pixels, we consider swapping sample sets from which values are

generated through

𝑄

𝑄

𝑄

. It subsumes both swapping pixel values and

swapping pixel values in the presence of a multiplicative factor 𝛼

𝛼

𝛼.

Single swap. The main goal is to evaluate the change of the energy

𝛿

due to a swap between the sample sets of some pixels

𝑎,𝑏

. More

precisely, if the original sample set image is

𝑆

𝑆

𝑆

then the new sample

set image is

𝑆

𝑆

𝑆′

such that

𝑆′

𝑎=𝑆𝑏, 𝑆′

𝑏=𝑆𝑎

, and

𝑆′

𝑖=𝑆𝑖

everywhere

else. This corresponds to images:

𝑄

𝑄

𝑄=𝑄

𝑄

𝑄(𝑆

𝑆

𝑆)

and

𝑄

𝑄

𝑄′=𝑄

𝑄

𝑄(𝑆

𝑆

𝑆′)

. The

two images dier only in the pixels with indices 𝑎and 𝑏. Let:

𝛿𝑎=𝑄′

𝑎−𝑄𝑎=𝑄𝑎(𝑆𝑏)−𝑄𝑎(𝑆𝑎)(14)

𝛿𝑏=𝑄′

𝑏−𝑄𝑏=𝑄𝑏(𝑆𝑎)−𝑄𝑏(𝑆𝑏).(15)

We will also denote the convolved images

˜

𝑄

𝑄

𝑄=𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄

and

˜

𝑄

𝑄

𝑄′=𝑔

𝑔

𝑔∗𝑄

𝑄

𝑄′

,

and also 𝜖

𝜖

𝜖=˜

𝑄

𝑄

𝑄−𝐼

𝐼

𝐼. Specically:

˜

𝑄𝑖=

𝑗∈Z2

𝑄𝑗𝑔𝑖−𝑗,˜

𝑄′

𝑖=˜

𝑄𝑖+𝛿𝑎𝑔𝑖−𝑎+𝛿𝑏𝑔𝑖−𝑏.(16)

We want to be able to eciently evaluate

𝛿=˜

𝑄

𝑄

𝑄′−𝐼

𝐼

𝐼2−˜

𝑄

𝑄

𝑄−𝐼

𝐼

𝐼2

,

since in the iterative minimization algorithms the candidate with the

minimum

𝛿

is kept. Using the above expressions for

˜

𝑄′

𝑖

we rewrite

𝛿as:

𝛿=˜

𝑄

𝑄

𝑄′−𝐼

𝐼

𝐼2−˜

𝑄

𝑄

𝑄−𝐼

𝐼

𝐼2=(17)

𝑖∈Z2˜

𝑄𝑖−𝐼𝑖+𝛿𝑎𝑔𝑖−𝑎+𝛿𝑏𝑔𝑖−𝑏2−˜

𝑄

𝑄

𝑄−𝐼

𝐼

𝐼2=(18)

2

𝑖∈Z2˜

𝑄𝑖−𝐼𝑖, 𝛿𝑎𝑔𝑖−𝑎+𝛿𝑏𝑔𝑖−𝑏+

𝑖∈Z2𝛿𝑎𝑔𝑖−𝑎+𝛿𝑏𝑔𝑖−𝑏2=(19)

2𝛿𝑎,

𝑖∈Z2

𝜖𝑖𝑔𝑖−𝑎+2𝛿𝑏,

𝑖∈Z2

𝜖𝑖𝑔𝑖−𝑏+

𝛿2

𝑎,

𝑖∈Z2

𝑔𝑖−𝑎𝑔𝑖−𝑎+𝛿2

𝑏,

𝑖∈Z2

𝑔𝑖−𝑏𝑔𝑖−𝑏+

2𝛿𝑎𝛿𝑏,

𝑖∈Z2

𝑔𝑖−𝑎𝑔𝑖−𝑏=

(20)

2𝛿𝑎,𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖(𝑎)+2𝛿𝑏,𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖(𝑏)+

(𝛿2

𝑎+𝛿2

𝑏),𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(0)+2𝛿𝑎𝛿𝑏,𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(𝑏−𝑎),(21)

where

𝐶𝑓 ,ℎ(𝑥)=∑𝑖∈Z2𝑓(𝑖−𝑥)ℎ(𝑖)

is the cross-correlation of

𝑓

and

ℎ

. We have reduced the computation of

𝛿

to the sum of only 4

terms. Assuming that

𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔

is known (it can be precomputed once for

a known kernel) and that

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖

is known (it can be recomputed after

a sucient amount of swaps have been accepted), then evaluating

a trial swap takes constant time (it does not scale in the size of the

image or the size of the kernel).

ACM Trans. Graph., Vol. 41, No. 3, Article 26. Publication date: June 2022.

26:4 •Vassillen Chizhov, Iliyan Georgiev, Karol Myszkowski, and Gurprit Singh

Multiple accepted swaps. It may be desirable to avoid recomputing

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖

even upon accepting a trial swap. For that purpose we extend

the strategy from [Analoui and Allebach 1992] for computing

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛

,

where 𝜖

𝜖

𝜖𝑛is the error image after 𝑛swaps have been accepted:

{(𝛿𝑎1, 𝛿𝑏1), . . . , (𝛿𝑎𝑛, 𝛿𝑏𝑛)}.(22)

This implies:

˜

𝑄𝑛

𝑖=˜

𝑄+∑𝑛

𝑘=1(𝛿𝑎𝑘𝑔𝑖−𝑎𝑘+𝛿𝑏𝑘𝑔𝑖−𝑏𝑘)

, and conse-

quently:

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛(𝑥)=(23)

𝑖∈Z2

˜

𝑄𝑖−𝐼𝑖+𝑛

𝑘=1(𝛿𝑎𝑘𝑔𝑖−𝑎𝑘+𝛿𝑏𝑘𝑔𝑖−𝑏𝑘)

𝑔𝑖−𝑥=(24)

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖(𝑥)+𝑛

𝑘=1(𝛿𝑎𝑘𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(𝑥−𝑎𝑘)+𝛿𝑏𝑘𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(𝑥−𝑏𝑘)).(25)

This allows avoiding the recomputation of

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖

after every accepted

swap, and instead, the delta on the

𝑛+

1-st swap with trial dierences

𝛿𝑎, 𝛿𝑏is:

𝛿𝑛+1=𝑄

𝑄

𝑄𝑛+1−𝐼

𝐼

𝐼2−𝑄

𝑄

𝑄𝑛−𝐼

𝐼

𝐼2=(26)

2𝛿𝑎,𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛(𝑎)+2𝛿𝑏,𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛(𝑏)+

(𝛿2

𝑎+𝛿2

𝑏),𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(0)+2𝛿𝑎𝛿𝑏,𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(𝑏−𝑎),(27)

where

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛

is computed from

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖

and

𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔

as derived in Eq. (17).

This computation scales only in the number of accepted swaps

since the last recomputation of

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖

. We also note that

𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(𝑥−𝑦)

evaluates to zero if

𝑥−𝑦

is outside of the support of

𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔

. Addi-

tional optimizations have been devised due to this fact [Analoui and

Allebach 1992].

3.2 Vertical swaps

In the vertical setting swaps happen only within the pixel itself,

that is:

𝛿𝑎=𝑄𝑎(𝑆′

𝑎)−𝑄𝑎(𝑆𝑎)

. Consequently,

˜

𝑄′

𝑖=˜

𝑄𝑖+𝛿𝑎𝑔𝑖−𝑎

.

Computing the dierence in the energies for the 𝑛+1-st swap:

𝛿𝑛+1=˜

𝑄

𝑄

𝑄𝑛+1−𝐼

𝐼

𝐼2−˜

𝑄

𝑄

𝑄𝑛−𝐼

𝐼

𝐼2=(28)

𝑖∈Z2˜

𝑄𝑛

𝑖−𝐼𝑖+𝛿𝑎𝑔𝑖−𝑎2−˜

𝑄

𝑄

𝑄𝑛−𝐼

𝐼

𝐼2=(29)

2

𝑖∈Z2˜

𝑄𝑛

𝑖−𝐼𝑖, 𝛿𝑎𝑔𝑖−𝑎+

𝑖∈Z2𝛿𝑎𝑔𝑖−𝑎2=(30)

2𝛿𝑎,

𝑖∈Z2

𝜖𝑛

𝑖𝑔𝑖−𝑎+𝛿2

𝑎,

𝑖∈Z2

𝑔𝑖−𝑎𝑔𝑖−𝑎=(31)

2𝛿𝑎,𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛(𝑎)+𝛿2

𝑎,𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(0).(32)

The corresponding expression for 𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛is:

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖𝑛(𝑥)=𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖(𝑥)+𝑛

𝑘=1

𝛿𝑎𝑘𝐶𝑔

𝑔

𝑔,𝑔

𝑔

𝑔(𝑥−𝑎𝑘).(33)

3.3 Multiple simultaneous updates

If the search space is ignored and the formulation is analyzed in

an abstract setting it becomes obvious that the vertical approach

corresponds to an update of a single pixel, while the horizontal

approach corresponds to an update of two pixels at the same time.

This can be generalized further. Let

𝑁

dierent pixels be updated

per trial, and let there be

𝑛

trials that have been accepted since

𝐶𝑔

𝑔

𝑔,𝜖

𝜖

𝜖

has been updated. Let the pixels to be updated in the current trial

be:

𝑎𝑛+1

1, . . . , 𝑎𝑛+1

𝑁

, and the accepted update at step

𝑘

be at pixels:

𝑎𝑘

1, . . . , 𝑎𝑘

𝑁

. Let

𝑄

𝑄

𝑄0=