ArticlePDF Available

Interactive Photo Editing on Smartphones via Intrinsic Decomposition

Authors:

Abstract and Figures

Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decomposition can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro. Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer decomposition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing, relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the corresponding depth data.
Content may be subject to copyright.
EUROGRAPHICS 2021 / N. Mitra and I. Viola
(Guest Editors)
Volume 40 (2021), Number 2
Interactive Photo Editing on Smartphones
via Intrinsic Decomposition
Sumit Shekhar1, Max Reimann1, Maximilian Mayer1,2, Amir Semmo1,2,
Sebastian Pasewaldt1,2, Jürgen Döllner1, and Matthias Trapp1
1Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany
2Digital Masterpieces GmbH, Germany
(a) Input (b) Tattoo (c) Glass (d) Mystique (e) Divine (f) Cartoon
Figure 1: Different types of effects produced with our mobile app. It is the first that supports a large variation of image manipulation tasks
within a unified framework, which is based on intrinsic image decomposition.
Abstract
Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view
or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single
known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decom-
position can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing
their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the
algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider
both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale
intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading
components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-
based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro.
Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach
for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer de-
composition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing,
relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the
corresponding depth data.
CCS Concepts
Computing methodologies ,..., Image-based rendering; Image processing; Computational photography;
1. Introduction
On a bright sunny day, it is quite easy for us to identify objects
like a wall, a car, or a bike irrespective of their color, material
or whether they are partially shaded. This remarkable capacity of
human visual system (HVS) to disentangle visual ambiguities due
to color, material, shape, and lighting is a result of many years of
evolution [BBS14]. Replicating this ability for machine vision—to
enable better scene understanding—has been a widely researched
topic, but ever has been challenging because of its ill-posed and
under-constrained nature.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John
Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
The physical formation of an image involves various unknowns
at macroscopic and microscopic levels, and decomposing them al-
together makes it ill-posed. A more relaxed approximation is given
by the Dichromatic Reflection Model where an image (I) is assumed
to be composed of the sum of specular (Is) and diffuse (Id) compo-
nents (at every pixel location x
x
x) [Sha85]:
I(x
x
x) = Id(x
x
x) + Is(x
x
x).(1)
The diffuse component (Id) can be further expressed as the product
of albedo (A) and shading (S) [BT78]:
Id(x
x
x) = A(x
x
x)·S(x
x
x).(2)
However, even this approximation is under-constrained, because
three unknowns— A(x
x
x),S(x
x
x)and Is(x
x
x)—need to be solved given
only the image color I(x
x
x). In this work, we propose a novel
smartphone-based system to extract intrinsic layers of albedo, shad-
ing and specularity. In our system, the specularity removal is car-
ried out as a pre-processing step followed by a depth-based energy
minimization for computing the other two layers. The computed
layers, apart from offering better scene understanding, facilitate a
range of image-editing applications such as recoloring, retexturing,
relighting, appearance editing etc. (Fig. 1).
Compared to many previous works, ours is not limited in assum-
ing a complete diffuse reflection. In general, the decomposition of
an image into diffuse reflectance (albedo) and shading is referred
to as Intrinsic Image Decomposition (IID). The existing IID algo-
rithms can be broadly classified into two categories:
Learning-based methods: the priors on albedo and shading are
incorporated as loss functions, and the decomposition is learned
by training. In the past few years—with the significant improve-
ment in deep-learning technology—such methods have become
quite popular [ZKE15,KPSL16,CZL18,LVv18]. However, cap-
turing real-world training data for IID is challenging and the
existing datasets might not be sufficient [GJAF09,?,BHK16,
SBZ18]. Unsupervised learning does not require any train-
ing data, however, the results are generally of inferior quality
[LVVG18,MCZ18,LS18]. Most learning-based models have
high GPU memory consumption, making them potentially un-
suitable for mobile devices—especially at those image resolu-
tions that an image-editing application typically requires. Fur-
thermore, these models are generally not controllable at run-
time, i.e., the decomposition cannot be fine-tuned to the image
at hand, which is a significant limitation for interactive editing
applications.
Optimization-based methods: a cost function based on priors is
minimized to find an approximate solution. Initial techniques
use simplistic priors, which are not suitable for real-world
scenes [TFA05]. More complex priors improve the accuracy
at the cost of associated computational complexity [ZTD12,
BBS14,BM15,WLYY17]. Readily available depth sensors fos-
tered depth-based methods for IID [CK13,JCTL14]. Nowadays,
with easily available mobile devices with depth sensors, a depth-
based intrinsic image decomposition method can be a preferred
choice for an intrinsic-image application in mobile environ-
ments.
As an additional constraint, only a few previous methods per-
form both IID and specularity extraction together. Innamorati
et al. [IRWM17] and Shi et al. [SDSY17] employ a learning-based
technique: both of them train and test for single objects but do
not consider a realistic scene with many objects. The algorithm by
Alperovich et al. [AG16] is designed for light-fields but cannot be
used for a single image. The method of Beigpour et al. [BSM18]
is applicable for a single image and, like ours, removes specu-
larities in a pre-processing step. However, for specularity extrac-
tion, they do not consider chroma channels leading to artifacts in
highly saturated image regions. Moreover, their method is an or-
der of magnitude slower than ours. Unlike most of the previous
standalone specularity removal techniques, we showcase our re-
sults based on a broad range of realistic images [ABC11]. Because
we treat high- and low-frequency specularities differently, we ob-
tain seamless outputs.
Finally, the processing schemes of many state-of-the-art tech-
niques are comparably slow (optimization-based and learning-
based), resource intensive and are limited to low image resolutions
(learning-based). Thus, using an intrinsic decomposition for inter-
active image editing on mobile devices is considered challenging.
We propose a system that provides a more practical approach to
intrinsic decomposition. Specifically, we address the following de-
sign objectives:
Accessibility: a decomposition is provided on readily available
mobile devices with depth sensors.
Speed: all post-capture processing takes at most a few seconds (on
the mobile device) before the edited photo can be viewed, even
when the device is offline. Thus, we cannot delegate processing
to a desktop computer or the cloud.
Interaction: interacting with the decomposition and editing
pipeline is possible in real-time, and the navigation affordances
are fairly obvious.
Quality: the rendered application outputs look (i) plausible with
respect to appearance editing and (ii) aesthetically pleasing for
image-stylization tasks.
To this end, we split our processing pipeline into pre-processing
and image-editing stages, of which the specularity removal and im-
age editing perform at interactive frame rates. Thereby, we pro-
vide the first mobile app that performs intrinsic decomposition in a
unified framework and supports a large variation of image editing
tasks (Fig. 1). This is technically achieved by utilizing the built-in
depth sensor and dedicated GPU of modern smartphones for real-
time capturing and interactive processing of RGB-D data.
Our contributions are summarized as follows, we propose:
1. A novel, interactive specularity removal method that treats high-
frequency and low-frequency specularities differently, performs
chroma-inpainting to address the problem of missing or little
chromaticity information for saturated pixels, and that is well-
suited for real-world images,
2. A fast and robust system for intrinsic decomposition of RGB-D
images on smartphones that makes use of depth-data for local
shading smoothness and enforce albedo (L1-)sparsity by em-
ploying the efficient iPiano optimization solver [OCBP14],
3. A variety of mobile-based applications—to show the ubiquitous
accessibility, speed, and quality of our method—using the given
depth data and/or computed intrinsic layers of albedo, shading,
and specularity.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
Specularity
Removal
Intrinsic
Decomposition
Image
Editing
Specular
Diffuse + Depth Albedo
Shading
Output
Depth
RGB-D Input
Figure 2: Flowchart of our complete framework showing extraction of intrinsic layers (Sec. 3) followed by image editing (Sec. 5).
2. Related Work
2.1. Specularity Removal
Some of the earliest methods for specularity removal were based
on color segmentation, thus they were not robust against tex-
tures [KSK88,BLL96]. Mallik et al. [MZBK06] introduce a par-
tial differential equation (PDE) in the SUV color space that it-
eratively erodes the specular component. A class of algorithms
use the concept of specular-free image based on chromaticity val-
ues [TI05,SC09]. Yang et al. [YWA10] use a similar approach,
and achieve real-time performance by employing parallel process-
ing. Kim et al. [KJHK13] use a dark channel prior to obtain
specular-free images, followed by an optimization framework. Guo
et al. [GZW18] propose a sparse low-rank reflection model and use
aL1norm constraint in their optimization to filter specularities. A
broad survey of specularity removal methods is provided by Ar-
tusi et al. [ABC11]. Recently, Li et al. [LLZI17] utilize both im-
age and depth data for removing specularity from human facial im-
ages. Most of these methods, however, employ specific object(s) or
scene settings to evaluate their methods and do not consider generic
real-world images. A recent method by Fu et al. [FZS19] aims
to address this issue; the authors assume that specularity is gener-
ally sparse and the diffuse component can be expressed as a linear
combination of basis colors. They present a wide range of results,
however, the optimization solving is comparably slow and is lim-
ited to low-resolution images. By contrast, our method is aimed for
generic real-world high-resolution images with interactive perfor-
mance on mobile devices.
2.2. Intrinsic Image Decomposition
The term intrinsic decomposition was introduced in the litera-
ture by Barrow and Tenenbaum [BT78]. The Retinex theory by
Land and McCann proved to be a crucial finding, which be-
came part of many following algorithms as a prior [LM71]. In
the course of previous decades, intrinsic decomposition algorithms
have been proposed for image [TFA05,BBS14,BM15,ZTD12,
ZKE15,KPSL16,CZL18,MCZ18,LS18,LXR18,LSR20], video
[YGL14,BST14,MZRT16], multiple-views [LBD13,DRC15,
MQD17] and light-fields [GEZ17,AG16,AJSG18,BSM18]. A
survey covering many of these algorithms is provided by Bonneel
et al. [BKPB17]. A particular class of algorithms use depth as ad-
ditional information for IID. Lee et al. [LZT12] use normals to
impose constraints on shading and also use temporal constraints to
obtain smooth results. Chen and Koltun [CK13] further decompose
shading into direct and indirect irradiance; the authors use depth to
construct position-normal vectors for regularizing them. Hachama
et al. [HGW15] use a single image or multiple RGB-D images to
construct a point cloud. The normal vectors along with low dimen-
sional global lighting model is used to jointly estimate lighting and
albedo. Similarly, we use depth information to impose local shad-
ing smoothness constraints. However, unlike previous methods, a
pre-processing step of specularity removal makes our method ro-
bust against specular image pixels. Moreover, we employ an effi-
cient iPiano optimization solver [OCBP14] for our fast and robust
mobile-based solution.
3. Method
A pre-processing step removes the specular highlights from the in-
put image (Sec. 3.1), the diffuse component is further decomposed
into albedo and shading layers using an efficient intrinsic decom-
position optimization (Sec. 3.2). The resulting intrinsic layers are
used to showcase various image editing applications (Sec. 5). A
flowchart of our full pipeline is depicted in Fig. 2.
3.1. Specularity Removal Filtering
It has been shown that the perception of lightness and gloss is
related to image statistics and can be altered by modifying the
skewness of sub-bands of luminance histogram [SLM08]. Our
specularity removal step is motivated from the above observation.
Further, in order to make our method robust against color arti-
facts we use image intensity Linstead of luminance for the above
[BSM18]. The chromaticity Cof the input image I(with color
channels R,G, and B) is processed separately to handle missing
color information for saturated specular pixels.
L=pR2+G2+B2,C=I
L(3)
A flowchart for our specularity removal algorithm is depicted in
Fig. 3, the method broadly consists of three major steps as the fol-
lowing.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
Intensity (In) Intensity (Out)
High Freq.
Low Freq.
Sub-bands
Chroma (In) Chroma (Out)
Input
Diffuse
Reducing positive coefficients of
sub-bands in the masked region
Specular
Mask
Inpainting missing information
by iterative bilateral hole-filling
Figure 3: Flowchart of our specularity removal pipeline described in Sec. 3.1. Note the chroma inpainting depicted by the inset.
3.1.1. Identification of Specularity
In general, specular reflection increases the intensity of output
spectrum and, furthermore, makes it more uniform. Both of these
factors are efficiently captured by the unnormalized Wiener entropy
(H) introduced by Tian and Clark [TC13]. It can concisely be ex-
pressed as the product of input-image color channels R,G, and B
(refer to Eqns. 1 - 6 in [TC13] for a detailed derivation):
H(I) = R·G·B.(4)
The proposed unnormalized Wiener (UW) entropy encapsulates the
color-direction-changing and intensity-increasing aspect of spec-
ularities. We can describe a specularity as a region where Hof
the total-reflection is significantly higher than the corresponding
diffuse-reflection.
H(Tot(λ)) H(Dif (λ)) >τ0(5)
H(Tot(λ)) >τ0+H(Dif (λ))
where Tot(λ)is the spectrum of the total reflection, Dif (λ)is the
spectrum of the diffuse component and τ0is a particular threshold.
The UW entropy for the diffuse component is assumed to have
little variation within the scene and is considered a constant. Thus, a
single universal threshold τ=τ0+H(Dif (λ)) can be applied to the
UW-entropy map for specular pixel identification. An image pixel
is identified as specular (SM) if H(Tot(λ)) is above a threshold (τ).
We assume that an image pixel is equal to the spectrum of total
reflection (i.e., H(Tot(λ)) = H(I)), thus the specular mask is given
as:
SM(x
x
x) = (1,if H(I)>τ
0,otherwise.
(6)
For our experiments, τ(0,0.5)has been empirically determined
to give plausible results (Fig. 4). The above specularity identifica-
tion approach is inspired by the work of Tian and Clark [TC13].
Please refer to this work for details.
3.1.2. Intensity Reduction of Specular Pixels
The highlights or specularity is efficiently captured by the posi-
tive coefficients in a luminance or intensity sub-band [BBPA15,
(a) Input (b) τ=0.08 (c) τ=0.12 (d) τ=0.17
Figure 4: Input image and corresponding specularity mask with in-
creasing value of threshold τ. Note that with a low threshold value,
even diffuse pixels are marked as specular. On the other hand, with
a higher threshold, some of specular pixels are missed.
BSM18]. For this purpose, we perform multi-scale decompo-
sition of the intensity image (L) by repetitive edge-aware im-
age filtering to obtain an intensity scale-space. In each repeti-
tion the spatial extent for the edge-aware filter is doubled pro-
ducing a series of images of increasing smoothness. A fast
way to achieve this on an iPhone is by downsampling the in-
tensity image and then performing edge-preserving upsampling
(CIEdgePreserveUpsample) with original intensity image as
guide, while the downsampling factor is doubled in each repeti-
tion. Subsequently a sub-band (or a frequency band) is obtained
by taking the difference between the current and the next scale. A
straightforward way to reduce the specular component is to scale
the positive coefficients in a sub-band with a constant κ<1. In
principle, the above operation will also erode image regions which
are both, diffuse and bright. We omit such cases by checking for
positive coefficients only within the specular mask (Sec. 3.1.1).
A common observation regarding specularity is its occurrence
as smooth patches of highlights along with some sparse irregu-
larities due to rough object surfaces. To address these two as-
pects of specularity distribution, we reduce the positive coeffi-
cients of high-frequency (κh) and low-frequency (κl) sub-bands
separately (Fig. 5). For all of our experiments, we use the values
0.5κh,κl0.2. Even though we use this approach to reduce
specularities, it can be easily extended (by using κh,κl>1) to
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
seamlessly enhance it for appearance editing [BBPA15] (see sup-
plementary material).
3.1.3. Chroma Inpainting of Specular Pixels
For saturated specular pixels, the chromaticity image might have
little or no information. We fill in this missing detail from neigh-
boring pixels using iterative bilateral filtering [TM98]. The initial
chromaticity image with the missing information in specular pixels
is considered as C0, and after k+1 iteration the modified image is
given as
Ck+1(p
p
p) = 1
Wp
q
q
qM(p
p
p)
Gσs(||p
p
pq
q
q||)Gσr(||Ck(p
p
p)Ck(q
q
q)||)Ck(q
q
q),
(7)
where the normalization factor Wpis computed as:
Wp=
q
q
qM(p
p
p)
Gσs(||p
p
pq
q
q||)Gσr(||Ck(p
p
p)Ck(q
q
q)||).(8)
The amount of filtering in each iteration is controlled by parame-
ters σsand σrfor image Ck. As seen in Eqn. 7, the next iteration
of chromaticity image is a normalized weighted average of the cur-
rent one: where Gσsis a spatial Gaussian that decreases the contri-
bution of distant pixels, Gσris a range Gaussian that decreases the
contribution of pixels that vary in intensity from Ck(p
p
p). We search
for neighboring pixels in a square pixel window, M(p
p
p), of length
(5,15)pixels. In principal, any sophisticated inpainting algorithm
can be used for this purpose. However, we chose the above proce-
dure because of its locality enabling parallel processing. The range
of the inpainting parameters is: σs(2,8)and σr(0.2,4.0).
3.2. Intrinsic Decomposition of RGB-D Images
In this section, we describe our optimization framework for de-
composition of the resulting diffuse image (Fig. 7). We assume
monochromatic, white illumination similar to previous IID meth-
ods, thus shading is scalar-valued and image intensity L(Eqn. 3) is
used as shading initialization for the optimization framework. Ini-
tial albedo is defined accordingly using Eqn. 2. We logarithmically
linearize the constraints to enable simpler optimization strategies,
a common practice in previous methods [BKPB17].
id(x
x
x) = a(x
x
x) + s(x
x
x)(9)
In the above formulation, the lower case letters of id,a, and sde-
notes log values of Id,A, and Srespectively at pixel location x
x
x. In
order to avoid log indeterminacy at close to zero values we add an
offset for logarithm computation i.e., id=log(Id+ε), for all our
experiments we set ε=1.4. We enforce the constraints per color
channel in the log-domain, i.e., id[c]a[c] + sfor c∈ {R,G,B}.
For our decomposition, we solve for both aand ssimultaneously
by minimizing the energy function,
E(x
x
x) = 1
2 λdEd(x
x
x) + λraEra (x
x
x) + λrsErs (x
x
x)!+λsp ||a(x
x
x)||1
(10)
where λdEd,λraEra , and λrsErs are data, retinex-albedo smooth-
ness, and retinex-shading smoothness terms respectively with their
corresponding weights. We use a L1regularizer to enforce sparsity
in the resulting albedo controlled by the weight λsp .
(a) Input image (b) Only HF specularity removed
(c) Only LF specularity removed (d) Diffuse image (HF and LF specu-
larity removed)
Figure 5: Effect of high frequency (HF) and low frequency (LF)
specularity removal on an input image.
3.2.1. Data Term
The data term ensures that the image is equal to the sum of resulting
albedo and shading in the log-domain. To make the solution robust,
this term is weighted by pixel intensity to avoid contributions from
noisy low-intensity pixels:
Ed(x
x
x) = L(x
x
x)||i(x
x
x)s(x
x
x)a(x
x
x)||2.(11)
We minimize the energy function (Eqn. 10) with respect to albedo
and shading separately using an iterative solver. The data term ex-
clusively contributes in the gradient-of-energy w.r.t. both albedo as
well as shading, thus coupling both the minimization. The weight-
ing of the energy term is controlled by λd(0.005,0.05).
3.2.2. Retinex Terms
The Retinex Theory [LM71] forms the basis of many intrinsic de-
composition techniques [BKPB17]. It imposes priors on how edges
vary differently for albedo and shading. Most of the existing meth-
ods assume that an image edge is either an albedo or a shading edge.
However, this is not always true and an edge can be present due
to both albedo and shading. Moreover, we can identify the shad-
ing edges efficiently using the given depth data. Thus, we utilize
the Retinex theory and impose constraints on albedo and shading
smoothness separately.
Albedo Smoothness. Ideally, an albedo image should be piece-
wise smooth. A straightforward way to achieve this is to perform
edge-preserving smoothing. We employ a weighting function to
identify and prevent smoothing at prominent albedo edges,
Era(x
x
x) =
y
y
yN(x
x
x)
wa(x
x
x,y
y
y)||a(x
x
x)a(y
y
y)||2(12)
The edge weight is controlled by a parameter αra, where a rela-
tively higher value ensures texture preservation,
wa(x
x
x,y
y
y) = expαra||a(x
x
x)a(y
y
y)||2(13)
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
For all our experiments, we use αra (5.0,20.0)and consider a
3×3 pixel neighborhood N(x
x
x)around pixel x
x
x. The weighting of
the energy term is regulated by λra (2.0,40.0).
Shading Smoothness. Ideally, a shading image should be smooth
except for discontinuities due to irregular scene geometry or indi-
rect illumination (such as inter-reflections and shadows). We as-
sume only direct-illumination and ignore discontinuities due to the
latter. By only taking scene geometry into consideration, we ex-
pect two scene points to have similar shading if they have similar
position and normal vectors [RH01]. The position vectors are con-
structed as [x,y,z]>where x,yare pixel coordinates and zis the
corresponding depth. The normal vector [nx,ny,nz]>is constructed
using the depth D(x
x
x)as,
n
n
n= [xD,yD,1.0]>(14)
xDand yDrepresent depth gradients in horizontal and ver-
tical directions. The normalized position vector and normal vec-
tor is combined to construct a feature vector f
f
f(for a given pixel
x
x
x): [x,y,z,nx,ny,nz]>. Thus, all pixels are embedded in a six-
dimensional feature space. The distance between two pixels in this
feature space is used to construct a weight map,
ws(x
x
x,y
y
y) = exp(αrs|| f(x
x
x)f(y
y
y)||2)(15)
The above weight preserves shading variations, captured as dis-
tance in feature space and the overall constraint is formulated as,
Ers(x
x
x) =
y
y
yN(x
x
x)
ws(x
x
x,y
y
y)||s(x
x
x)s(y
y
y)||2(16)
Similar to the previous term, N(x
x
x)represents the 3×3 pixel neigh-
borhood around pixel x
x
x. The weight is controlled by a parameter
αrs; for all our experiments we use αrs (20.0,200.0). The weigh-
tage of the energy term is regulated by λrs (15.0,100.0). The
feature space introduced above is based on the work of Chen and
Koltun [CK13]. However, we consider this distance only in a local
neighborhood to increase runtime performance.
3.2.3. Optimization Solver
All the energy terms discussed above are smooth and convex ex-
cept for the L1regularizer, which is specific for albedo. This allows
for a straightforward energy minimization w.r.t. shading. For both
albedo and shading we minimize the energy iteratively. By using
an iterative solver, we overcome the limitation of storing a large
matrix in memory and calculating its inverse. Moreover, an itera-
tive scheme allows us to stop the solver once we achieve plausible
results. A shading update sk+1is obtained by employing Stochastic
Gradient Descent (SGD) with momentum [Qia99],
sk+1=skαE(sk) + β(sksk1)(17)
where αand βare the step size parameters, Eis the energy gra-
dient w.r.t. shading and kis the iteration count.
In order to enforce albedo sparsity, we utilize an L1regularizer
for albedo. The regularizer is convex but not smooth and thus makes
the minimization of energy w.r.t. albedo challenging. The solution
for a class of problems that aim to solve for,
argmin
aRN
g(a) + h(a)(18)
where g(a)is smooth and h(a)is non-smooth while both are
convex, is generally given by proximal gradient descent (PGD)
[LM79]. A more efficient way to solve the above is proposed by
Ochs et al. [OCBP14] in their iPiano algorithm with the following
update scheme,
ak+1= (I
I
I+αδh)1
| {z }
backward step akαg(ak)
| {z }
forward step
+β(akak1)
| {z }
inertial term (19)
the step size parameters αand βare same as in 17. The inertial term
makes iPiano more effective than PGD, where the update scheme
comprises of only forward descent step and backward proximal
mapping. For the special case where h(a) = λ||a||1the proximal
operator is given by soft thresholding,
(I
I
I+αδh)1(u) = max{|u| − αλ,0} · sgn(u)(20)
For our problem, the data (3.2.1) and retinex terms (3.2.2) are
smooth and their sum can replace gin Eqn. 18. The L1regulariz-
tion is achieved with h=λsp||a||1. The regularized albedo is solved
for iteratively using Eqns. 19 and 20. For most of our experiments,
α=0.003,β=0.015, and λsp =0.15 yield plausible results.
Our stopping criteria is a trade-off between performance and ac-
curacy, we do not compute energy residue for this purpose. We aim
to achieve a close to interactive performance with visually convinc-
ing application results. To this end, we empirically determined 100
iterations to be a sufficient approximation (Fig. 6).
4. Evaluation
We evaluated our approach for a variety of real-world images and
ground truth data. We perform qualitative comparisons with recent
methods and quantitative evaluations with existing datasets for both
specularity removal and intrinsic decomposition.
Specularity Removal. We compare our method against recent
specularity removal techniques by Fu et al. [FZS19], Akashi
et al. [AO16], Yang et al. [YWA10], and Shen et al. [SC09]. For
the method of Fu et al. , the results were generously provided by
the authors, and for others we use the implementation by Vítor
Ramos [Ram20] to generate the results. We observe that most of the
existing specularity removal techniques are not well suited for real-
world images. The method by Fu et al. , which is especially tailored
for real-world scenario, also struggles to handle high-resolution im-
ages. Our proposed algorithm performs better than state-of-the-art
works for natural images (Fig. 7). It is comparable to results in
a controlled lab setting (see supplementary material). Moreover,
our method works at interactive rates on a mobile device for high-
resolution images. Please refer to the supplemental material for
how the intermediate steps improve the output quality.
Note that the comparisons for specularity removal are performed
using the desktop-based implementation of our algorithm, which
makes use of guided image filtering for multi-scale decomposition
of image intensity. For our mobile version, we replace guided fil-
tering by inbuilt edge-aware filters on iOS (iPhone) to achieve in-
teractive performance while compromising on quality.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
Table 1: Quantitative evaluation for intrinsic decomposition (pixel
value is scaled between 0 to 1), the lower the error value, the better.
Datset MSE DSSIM
Ours Bell Lettry Jeon Ours Bell Lettry Jeon
LFID 0.075 0.056 0.012 0.085 0.191 0.144 0.158 0.274
MPI-Sintel 0.145 0.041 0.044 0.042 0.325 0.244 0.253 0.288
Intrinsic Decomposition. We compare our intrinsic decomposi-
tion results with a RGB (Bell et al. [BBS14]), a RGB-D (Jeon
et al. [JCTL14]) and a learning (Lettry et al. [LVVG18]) based
technique to cover a broad range of methods. We use the implemen-
tations provided by the authors. Our results are comparable to the
above methods (Fig. 12). Note that the methods of Bell et al. and
Jeon et al. perform at an order of magnitude slower than ours on
a GPU-enabled desktop system. Moreover, unlike ours the quality
of their result for indoor and outdoor scene is not consistent. They
perform quite well for indoor scenes however, their output quality
degrade significantly for outdoor scenes (see supplementary ma-
terial). Even though the time taken by Lettry et al. is comparable
to our mobile-phone based technique, we perform comparatively
better in terms of output quality.
Quantitative Evaluation. For a quantitative evaluation, we re-
quire a dataset that includes ground truth depth, albedo, shad-
ing, and specularity. To this end, we use the Light-Field Intrin-
sic Dataset (LFID) [SBZ18]. We also test only the intrinsic de-
composition component of our approach on the MPI-Sintel dataset
[BWSB12]. We use MSE and DSSIM as error metric while com-
paring the computed albedo (for intrinsic decomposition evalua-
tion) and diffuse image (for specularity removal evaluation) with
the respective ground truth. We compare our intrinsic decomposi-
tion results with other methods (specified in Fig. 12) in Tab. 1. For
the MPI-Sintel case, we consider one frame from all the scenes,
and for LFID we use three views from Street Guitar and Wood
Metal light-fields. Our method performs comparatively better on
LFID than MPI-Sintel dataset because the modeling assumptions
for LFID is similar to ours which is physically more accurate. For
specularity removal we employ the desktop implementation of our
approach and achieve MSE and DSSIM values of 0.001 and 0.018
respectively.
Run-time Performance. Our whole processing pipeline has been
implemented on an iPhone 11 Pro smartphone running on the
iOS 13 operating system with an Apple A 13 Bionic processor and
4GB of RAM. We make use of Apples Metal API for GPU-based
processing. The captured image is downscaled by a factor of 0.3 for
interactive performance while maintaining sufficient quality. The
resulting image resolution is of 1128 ×1504 pixels and the corre-
sponding depth map is either of resolution 480 ×640 pixels for the
front facing true-depth sensor or 240×320 pixels for the back cam-
era passive stereo setup. We scale the depth map using built-in fil-
ters to match the image resolution, for consistent processing. On av-
erage, the pre-processing step of specularity removal takes 0.1 sec-
onds. For solving the optimization described in Sec. 3.2, we employ
an iterative solver and analyze its performance with an increase in
number of iterations for two kernel resolutions of 3 ×3 and 5 ×5
pixels. Our goal is to achieve visibly plausible results with interac-
0.5
1.5
2.5
3.5
050 100 150 200
Execution Time (in Sec.)
Number of Iterations
Kernel Width = 3 px. Kernel Width = 5 px.
Figure 6: Performance of the iterative optimization solver for dif-
ferent kernel widths and number of iterations. The values are com-
puted after an average of seven runs.
tive processing. We empirically determine 100 iterations as a good
trade-off for the above requirement with an execution time of 1.5
seconds for a 3 ×3 pixels kernel resolution (Fig. 6). Our mate-
rial editing pass requires to compute sub-bands in a pre-processing
stage for each intrinsic layer, which takes 3.5 seconds. Subse-
quent thereto, the editing is interactive. The other application com-
ponents run interactively allowing for seamless editing.
5. Applications
A perfect, physically accurate editing of a photo would require
full inverse rendering with high precision. However, one can
achieve convincing material [BSM18,KRFB06] and volumetric
media [NN03] editing even without the above. The intrinsic de-
composition output can also be effectively used for enhancing im-
age stylization results [MZRT16]. The following applications in
our work are based on the above observations.
5.1. Material Appearance Editing
Our material editing framework is based on the work of Beigpour
et al. [BSM18], where the authors modify the intensity of albedo,
shading, and specularity using band-sifting filters [BBPA15]. The
modified intrinsic layers are merged to form the output image (Iout)
with edited appearance,
Iout =A(r1m1g1,η1)·S(r2m2g2,η2) + Is(r3m3g3,η3)(21)
where rimigiwith i∈ {1,2,3}represents a component of respective
intrinsic layer—A,S, and Is(described in Eqns. 1and 2)—intensity,
that is band-sifted. The component categorization is based on the
following signal attributes: spatial frequency (r), magnitude (m),
and sign (g). Only a predefined set of sub-categories is defined:
ri∈ {H,L,A},mi∈ {H,L,A},gi∈ {P,N,A}, where Hand Lde-
note high and low frequency/magnitude range, Pand Nrepresent
positive and negative values, and Adenote “all”, i.e., the complete
category. The amount of sifting is controlled by the scaling factor
ηi. We can boost (ηi>1), reduce (0 <ηi<1), or invert (ηi<0)
the selected component respectively.
In our framework, we replace the original manual object-
segmentation with a mask generation step based on machine learn-
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
Input Ours Fu et al. [FZS19] Akashi et al. [AO16] Yang et al. [YWA10] Shen et al. [SC09]
Figure 7: Comparison of specularity removal for real-world images. The figure contains input image and the corresponding diffuse image
obtained using ours, Fu et al. [FZS19], Akashi et al. [AO16], Yang et al. [YWA10], and Shen et al. [SC09] specularity removal methods.
(a) Input (b) Beigpour et al. (c) Ours
Figure 8: Comparing our translucency effect with [BSM18].
ing [SHZ18] or iPhone segmentation mattes [FVH19]. We en-
hance their transparency appearance edit by using depth-based tex-
ture warping (Fig. 8). Our framework is also able to introduce new
textures in the albedo layer for the purpose of coherent retextur-
ing (Fig. 13(a) - (c)). Moreover, our editing framework allows for
multiple edit passes, which was not addressed in previous works.
5.2. Atmospheric Appearance Editing
We perform atmospheric editing as de-weathering and relighting in
the form of God rays. Our de-weathering approach is based on the
(a) Input (b) Low-density fog (c) High-density fog
Figure 9: Input image and atmospheric edit with virtual fog.
work of Narasimhan et al. [NN03], which enables to synthesize an
image-based fog-like appearance. According to their de-weathering
model, the output image (Iout) can be expressed as a linear combi-
nation of the input image (Iin) and the brightness of the sky (F)
using the given depth data (D):
Iout =Iin ·exp(θD) + F·1exp(θD)(22)
The scattering parameter θ(0.2,7)controls the above linear
combination. We further improved the result by using an ad-
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
(a) Input (b) RGB-based (c) Only albedo (d) Shad. + Depth
Figure 10: Enhancements and variations of (b) the RGB cartoon
stylization effect using albedo/shading decomposition with (c) a
constant shading, and (d) smoothed shading and additional depth
edge stylization.
(a) Stylized Input (b) Raymarched (c) Normal-based
Figure 11: Comparison of shadowing/relighting methods. Here, a
portrait with lighting from the back (a) is used to showcase the ef-
fect of cartoon stylization and re-lighting using (b) a ray-marching
based variant and (c) a normal-angle variant for hard shadows.
vanced atmospheric-scattering model that accounts for absorption,
in-scattering, and out-scattering independently [HP02] (Fig. 9).
Our scene relighting approach is based on the image-based volu-
metric light scattering model of Mitchell [Mit08]. It consists of two
steps: (1) create an occlusion map with respect to a defined point
light source using depth data and (2) subsequently use the occlu-
sion map to cast rays from the light source to every pixel. The use
of an occlusion map creates an appearance of light rays shooting
from the background to simulate the appearance of God rays.
For both of the above edits, we make use of depth data captured
by the smartphone instead of manual generation or prediction as
done in previous works. We combine relighting with de-weathering
to create new enhanced atmospheric edits (Fig. 13(d) - (f)).
5.3. Image Stylization using Intrinsic Layers
We implement a cartoon stylization pipeline based on the ex-
tended difference-of-Gaussians (XDoG) filter by Winnemöller
et al. [WOG06,WKO12]. The filtering pipeline is enhanced using
the computed intrinsic layers as follows.
5.3.1. Depth-based Edge Detection
Color-based edge detection methods generally fail to accurately
identify edges in the case of smooth or non-apparent lighting tran-
sitions between objects, and might over-emphasize noisy patterns
in the image. To improve these issues and enhance geometric edges
in the image, we make use of the given depth data.
We intensify depth variations by computing the angle-sharpness
(φ[0,1]), defined as the magnitude of normal vectors pointing
away from the camera, φ=||Nxy||
DNz, where the image normal N(pro-
duced by Eqn. 14) and depth Dis used to decrease the edge mag-
nitude for distant objects of usually noisy depth information. The
angle-sharpness is used to boost gradients—derived from the struc-
ture tensor—in areas of high angle-sharpness,
STφ=((φω +1)ST D,if φ<(ω1)
ω
STD,otherwise (23)
where STDis the structure tensor calculated on the depth image in
log space, and ω[0,1000]is a boost factor for low-luminosity
edges (we use ω=100 in our experiments). Smoothing STφwith a
Gaussian yields the smoothed structure tensor from which the edge
tangent flow is derived via an eigenanalysis [BWBM06].
The flow-based difference-of-Gaussians, as defined in [KD08,
WKO12], is then applied on the angle-sharpness φalong the flow
field induced by the smoothed STφto obtain coherent depth edges
(Fig. 10(d) and supplementary material).
5.3.2. Albedo and Shading Combination
In the color-based cartoon stylization pipeline, the luminance val-
ues of the input are smoothed and quantized to create a flat material
effect. Through the use of our image decomposition framework,
shading and albedo can be combined in multiple ways to enhance
this stylization. Using albedo only, a flat cartoon like style can be
created (Figs. 10(c) and 13(g)), due to the removal of shading, the
output is brighter than the original image and geometric features
are mainly indicated by XDoG edges.
There are several ways of abstracting the shading information
before recombining it with albedo for enhanced cartoon styliza-
tion. Edge-preserving smoothing of the shading layer with a large
filter kernel yields an airbrush look (Fig. 10(d)), while quantiz-
ing the shading yields a look similar to a classical cartoon styl-
ization [WOG06]. Another method for flattening shading informa-
tion is to use a segmentation-based approach. We implemented a
GPU-based quick-shift filter [VS08] to segment shading accord-
ing to albedo clusters (Fig. 13(h)). Shading alone, combined with
halftoning and edges, can create a vintage effect (Fig. 13(i)). Shad-
ing abstraction is a single-channel operation and is recombined uni-
formly with albedo in RGB space.
5.3.3. Shadows
Shadows in hand-drawn cartoons are an important tool to convey
geometric and lighting cues about the scene and are also often
strategically placed to emphasize character expressions. A method
based on occlusion maps can be used to generate soft-shadows with
semi-realistic lighting (Fig. 11(b)). To create less realistic but more
cartoon-like hard shadows, we assume that shadows are only set on
a foreground object and approximate the lighting based on an angu-
lar thresholding of the depth map. For a given pixel, the re-lighted
shading ˆsis defined as:
ˆs=(s,if||arctan(ny,nx)ρ|| <θand arccos(nz
||n|| )<γ
sl,otherwise
(24)
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
Input Ours Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18]
Figure 12: Comparison of intrinsic decomposition with other methods. The figure contains input image and the corresponding albedo
obtained using ours, Bell et al. [BBS14], Jeon et al. [JCTL14] and Lettry et al. [LVVG18] intrinsic decomposition methods. Please see
supplementary material for shading results.
where l[0,2]is a luminance multiplier that either emulates
shadow (l<1) or lighting (l>1), ρis an angle that controls
the shadow direction around the foreground object, and θis the
shadow circumference that is calculated by thresholding the an-
gle deviation from ρ. To emulate the depth of the light source,
normal z-angle thresholding includes only surface-normals that
point at least γdegrees away from the camera (Fig. 11(c), with
ρ=π,θ=π,γ=0.01).
6. Discussion and Limitations
Our goal is to provide photorealistic, interactive image editing us-
ing readily available RGB-D data on high-end smartphones. To this
end, we implement an intrinsic decomposition technique capable
of running on smartphones. The trade-offs between performance
and accuracy (Sec. 4) is biased towards performance for the sake
of interactivity, but nonetheless we are able to obtain high quality
results. Unlike most of the previous methods, we perform a pre-
processing step of specularity removal and do not assume “only
diffuse reflection” in the scene. We observe that the above am-
biguity, apart from state-of-the-art methods, is also present in the
popular intrinsic dataset – MPI-Sintel [BWSB12]. For MPI-Sintel,
specularity is encoded as part of the shading information, which is
physically inaccurate. Our observations suggest that specularities
are formed as a complex interplay between reflectance and shad-
ing, and thus should be handled separately.
The extracted intrinsic layers—along with available depth
data—allows for a variety of image manipulations. However, we
make some simplifying assumptions to achieve interactive pro-
cessing and cope with the limited computing capabilities of mo-
bile phones—note that most of these assumptions are also com-
mon for many state-of-the-art desktop-based methods. First of all,
we only consider direct illumination and ignore the multi-bounce
effects of light, such as color bleeding and soft shadows. The as-
sumption of white colored illumination is also not valid for many
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
Material Edit
Input (a) Retexturing (b) Recoloring (c) Translucency
Atmospheric Edit
Input (d) Fog (e) God rays (f) Fog + God rays
Stylization
Input (g) Albedo + Depth Edges (h) Quickshift Shading (i) Halftoned Shading
Figure 13: Showcasing results of our full pipeline.
real-world scenes. A multi-color illuminant can cause color varia-
tions that can be mistakenly classified as albedo instead of shad-
ing. We initialize albedo with a chromaticity image for improved
performance [MZRT16], and do not perform clustering in the chro-
maticity domain, which leads to color shifts especially in regions
with low pixel-intensity. Despite the above limitations, our tech-
nique gives plausible application results at interactive frame rates.
7. Conclusions and Future Work
We present a system approach that performs intrinsic image de-
composition on smartphones. To the best of our knowledge, it is the
first such approach for smartphones. Using the depth data captured
by built-in depth sensors on smartphones, together with a novel
specularity removal pre-processing step, we are able to obtain high-
quality results. A GPU-based implementation using the Metal API
allows for close to interactive optimization solving and interactive
image editing. A qualitative evaluation shows that our specularity
removal method performs better than state-of-the-art approaches
for real-world images. The albedo and shading layer results are on
par with state-of-the-art desktop-based methods. Finally, we show-
case how the intrinsic layers can be used for a variety of image-
editing applications.
A mobile-based intrinsic decomposition, as provided in this
work, could be used for photo-realistic image editing in Augmented
Reality (AR) applications. As part of future work, we aim to re-
lax some of the existing assumptions and address image scenes
with multi-color illuminant [BT17] and indirect illumination ef-
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
fects [MSZ19]. We also assume that the super-resolution of depth
maps can further enhance our results [VAE19]. Moreover, we be-
lieve that our specular pixel detection can be made more robust
with a non-binary thresholding and better handling of bright image
regions.
Acknowledgements
We thank the anonymous reviewers for their valuable feedback.
We thank Mohammad Shafiei and Mahesh Chandra for valuable
discussion w.r.t. optimization solver. We thank Florence Böttger
for her help with the development of atmospheric editing pipeline.
We thank Ariane Morassi Sasso, Harry Freitas da Cruz, Orhan
Konak and Jessica Jall for patiently posing for the pictures. This
work was funded by the German Federal Ministry of Education
and Research (BMBF) (through grants 01IS15041 – “mdViProject”
and 01IS19006 – “KI-Labor ITSE”) and the Research School on
“Service-Oriented Systems Engineering” of the Hasso Plattner In-
stitute.
References
[ABC11] ARTUSI A., BANTERLE F., CHETVE RIKOV D.: A survey of
specularity removal methods. Computer Graphics Forum 30, 8 (2011),
2208–2230. 2,3
[AG16] ALPEROVI CH A., G OLDLUECKE B.: A variational model for
intrinsic light field decomposition. In Asian Conference on Computer
Vision (ACCV), November 20-24 (2016), vol. 10113 of Lecture Notes
in Computer Science, pp. 66–82. 2,3
[AJSG18] ALPEROVICH A., JOHANNSEN O., STRE CKE M., GOLD-
LUECKE B.: Light field intrinsics with a deep encoder-decoder net-
work. In IEEE Conference on Computer Vision and Pattern Recognition,
CVPR, June 18-22 (2018), IEEE Computer Society, pp. 9145–9154. 3
[AO16] AKASHI Y., OKATANI T.: Separation of reflection components
by sparse non-negative matrix factorization. Computer Vision and Image
Understanding 146, C (May 2016), 77–85. 6,8
[BBPA15] BOYADZH IEV I., BALA K ., PARI S S., ADELSON E.:
Band-sifting decomposition for image-based material editing. ACM
Transactions on Graphics 34, 5 (Nov. 2015). 4,5,7
[BBS14] BELL S., BALA K., SNAVELY N.: Intrinsic images in the wild.
ACM Transactions on Graphics 33, 4 (July 2014). 1,2,3,7,10
[BHK16] BEIGPOUR S., HAM. L., KUNZ S., KOLB A. , BLA NZ
V.: Multi-view multi-illuminant intrinsic dataset. In Proceedings of
the British Machine Vision Conference (BMVC) (September 2016),
pp. 10.1–10.13. 2
[BKPB17] BONNEE L N., KOVACS B., PARIS S., BALA K.: Intrinsic
decompositions for image editing. Computer Graphics Forum 36, 2 (May
2017), 593–609. 3,5
[BLL96] BAJCSY R., LEE S. W., LE ONAR DIS A.: Detection of dif-
fuse and specular interface reflections and inter-reflections by color im-
age segmentation. International Journal of Computer Vision 17, 3 (Mar.
1996), 241–272. 3
[BM15] BAR RON J. T., M ALI K J.: Shape, illumination, and reflectance
from shading. IEEE Transactions on Pattern Analysis and Machine
Intelligence 37, 8 (2015), 1670–1687. 2,3
[BSM18] BEIGPOUR S., SHE KHAR S., MA NSO URYAR M.,
MYSZKOWS KI K., SEIDEL H.-P.: Light-field appearance editing
based on intrinsic decomposition. Journal of Perceptual Imaging 1, 1
(2018), 15. 2,3,4,7,8
[BST14] BONNEE L N., SUNKAVALL I K., TOMPKIN J., SUN D.,
PARI S S., PFISTE R H.: Interactive intrinsic video editing. ACM
Transactions on Graphics 33, 6 (Nov. 2014). 3
[BT78] BAR ROW H., TENE NBAUM J.: Recovering intrinsic scene
characteristics from images. Tech. rep., Artificial Intelligence Center,
SRI International, 1978. 2,3
[BT17] BAR RON J. T., T SAI Y.: Fast fourier color constancy. In 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017), pp. 6950–6958. 11
[BWBM06] BROX T., WEICKERT J., BURGETH B., MR ÁZEK P.: Non-
linear structure tensors. Image and Vision Computing 24, 1 (2006), 41–
55. 9
[BWSB12] BUT LER D. J., WULFF J., STANL EY G. B., BLACK M. J.: A
naturalistic open source movie for optical flow evaluation. In Computer
Vision – ECCV 2012 (2012), Fitzgibbon A., Lazebnik S., Perona P., Sato
Y., Schmid C., (Eds.), pp. 611–625. 7,10
[CK13] CHEN Q., KOLTUN V.: A simple model for intrinsic image
decomposition with depth cues. In IEEE International Conference on
Computer Vision (ICCV) (USA, 2013), p. 241–248. 2,3,6
[CZL18] CHENG L., ZHANG C., LIAO Z.: Intrinsic image transforma-
tion via scale space decomposition. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2018), pp. 656–665. 2,3
[DRC15] DUCHÊN E S., RIANT C., CH AURASIA G., MORENO J. L.,
LAFFONT P.-Y., PO POV S., BOUSSEAU A., DR ETTAK IS G.: Multi-
view intrinsic images of outdoors scenes with an application to relight-
ing. ACM Transactions on Graphics 34, 5 (Nov. 2015). 3
[FVH19] FORD B., VESTERGAARD J. S., HAYWARD D.: Ad-
vances in camera capture and photo segmentation, 2019. https://
developer.apple.com/videos/play/wwdc2019/260/.8
[FZS19] FUG., ZHA NG Q., SONG C., LI N Q., XIAO C.: Specular
highlight removal for real-world images. Computer Graphics Forum 38,
7 (2019), 253–263. 3,6,8
[GEZ17] GARCES E., ECHEVARRIA J. I., ZHANG W., WUH., ZH OU
K., GUTIERREZ D.: Intrinsic light field images. Computer Graphics
Forum 36, 8 (2017), 589–599. 3
[GJAF09] GROSSE R., JOHN SON M. K ., AD ELSON E. H., FREEMAN
W. T.: Ground truth dataset and baseline evaluations for intrinsic image
algorithms. In International Conference on Computer Vision (ICCV)
(2009), pp. 2335–2342. 2
[GZW18] GUO J., ZH OU Z., WANG L.: Single image highlight removal
with a sparse and low-rank reflection model. In European Conference on
Computer Vision (ECCV), Munich, Germany, September 8-14 (2018),
pp. 282–298. 3
[HGW15] HAC HAMA M., GHANEM B., WONKA P.: Intrinsic scene de-
composition from rgb-d images. In IEEE International Conference on
Computer Vision (ICCV) (2015), pp. 810–818. 3
[HP02] HOFFMAN N., PREETHAM A. J.: Rendering out-
door light scattering in real time, 2002. http://amd- dev.
wpengine.netdna-cdn.com/wordpress/media/2012/10/
ATI-LightScattering.pdf.9
[IRWM17] INNA MORATI C., RITSCHEL T., WEYRICH T., MITRA
N. J.: Decomposing single images for layered photo retouching.
Computer Graphics Forum 36, 4 (2017), 15–25. 2
[JCTL14] JEON J., CHO S., TONG X., LEE S.: Intrinsic image de-
composition using structure-texture separation and surface normals. In
European Conference on Computer Vision (ECCV) (2014), pp. 218–233.
2,7,10
[KD08] KYPRIANIDIS J. E., DÖ LLNER J.: Image abstraction by struc-
ture adaptive filtering. In Theory and Practice of Computer Graphics
(2008), The Eurographics Association. 9
[KJHK13] KIM H., JI N H., HADAP S., KWE ON I.: Specular reflection
separation using dark channel prior. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2013), p. 1460–1467. 3
[KPSL16] KIM S., PARK K., SO HN K., LIN S.: Unified depth prediction
and intrinsic image decomposition from a single image via joint con-
volutional neural fields. In European Conference on Computer Vision
(ECCV) (2016), pp. 143–159. 2,3
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
[KRFB06] KHAN E. A., REINHARD E., FLEMING R. W., BÜ LTHOFF
H. H.: Image-based material editing. ACM Transactions on Graphics
25, 3 (July 2006), 654–663. 7
[KSK88] KLINKE R G. J., SHAF ER S. A., KANADE T.: The measure-
ment of highlights in color images. International Journal of Computer
Vision 2, 1 (Jun 1988), 7–32. 3
[LBD13] LAFFONT P., BOUSSEAU A., DRETTAKIS G.: Rich intrin-
sic image decomposition of outdoor scenes from multiple views. IEEE
Transactions on Visualization and Computer Graphics 19, 2 (2013), 210–
224. 3
[LLZI17] LIC., LIN S., ZHOU K., I KEU CHI K.: Specular highlight
removal in facial images. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) (2017), pp. 2780–2789. 3
[LM71] LAND E. H., MCCA NN J. J.: Lightness and retinex theory.
Journal of the Optical Society of America 61, 1 (1971), 1–11. 3,5
[LM79] LIONS P. L., M ERC IER B.: Splitting algorithms for the sum of
two nonlinear operators. SIAM Journal on Numerical Analysis 16, 6
(1979), 964–979. 6
[LS18] LIZ., SNAVELY N.: Learning intrinsic image decomposition
from watching the world. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) (2018), pp. 9039–9048. 2,3
[LSR20] LIZ., SHA FIEI M., RAMAMOORTHI R., SUNK AVALLI K.,
CHANDRAKER M.: Inverse rendering for complex indoor scenes:
Shape, spatially-varying lighting and svbrdf from a single image. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2020), pp. 2472–2481. 3
[LVv18] LETTRY L., VANH OEY K., VAN GOOL L.: Darn: A deep ad-
versarial residual network for intrinsic image decomposition. In 2018
IEEE Winter Conference on Applications of Computer Vision (WACV)
(2018), pp. 1359–1367. 2
[LVVG18] LET TRY L., VA NHO EY K., VAN GOOL L.: Unsupervised
deep single-image intrinsic decomposition using illumination-varying
image sequences. Computer Graphics Forum 37, 7 (2018), 409–419.
2,7,10
[LXR18] LIZ., XUZ., RAMAMOORTHI R., SUNKAVALL I K., CHAN-
DRAKER M.: Learning to reconstruct shape and spatially-varying re-
flectance from a single image. ACM Transactions on Graphics 37, 6
(Dec. 2018). 3
[LZT12] LEE K. J., ZHAO Q., TONG X., GONG M., IZAD I S., LE E
S. U., TAN P., LIN S .: Estimation of intrinsic image sequences from im-
age+depth video. In European Conference on Computer Vision (ECCV)
(2012), pp. 327–340. 3
[MCZ18] MAW.-C., CHU H., ZHOU B., URTAS UN R., TORR ALBA
A.: Single image intrinsic decomposition without a single intrinsic
image. In European Conference on Computer Vision (ECCV) (2018),
pp. 211–229. 2,3
[Mit08] MITCHELL K.: Volumetric light scattering as a post-process. In
GPU Gems 3, Nguyen H., (Ed.). Addison-Wesley, 2008, pp. 275–285. 9
[MQD17] MÉLOU J., QAU Y., DURO U J.-D., CASTAN F., C RE-
MERS D.: Beyond multi-view stereo: Shading-reflectance decomposi-
tion. In Scale Space and Variational Methods in Computer Vision (2017),
pp. 694–705. 3
[MSZ19] MEKA A., SHAFIEI M., ZOLLHOEFER M., RICHARDT C.,
THEOBALT C.: Live illumination decomposition of videos. arXiv
preprint arXiv:1908.01961 (2019). 11
[MZBK06] MALLICK S. P., ZIC KLER T., BELHUME UR P. N., KR IEG -
MAN D. J.: Specularity removal in images and videos: A pde approach.
In European Conference on Computer Vision (ECCV) (2006), pp. 550–
563. 3
[MZRT16] MEKA A ., ZO LLHÖFER M., RICHARDT C., THE OBALT C.:
Live intrinsic video. ACM Transactions on Graphics 35, 4 (July 2016).
3,7,11
[NN03] NARASIMHAN S. G., NAYAR S.: Interactive deweathering of
an image using physical models. In ICCV Workshop on Color and
Photometric Methods in Computer Vision (October 2003). 7,8
[OCBP14] OCHS P., CH EN Y., BROX T., POCK T.: ipiano: Inertial prox-
imal algorithm for non-convex optimization. SIAM journal on imaging
sciences 7, 2 (2014), 1388–1419. 6
[Qia99] QIAN N.: On the momentum term in gradient descent learning
algorithms. Neural networks 12, 1 (1999), 145–151. 6
[Ram20] RAMOS V.: SIHR: a MATLAB/GNU Octave toolbox for single
image highlight removal. Journal of Open Source Software 5, 45 (Jan.
2020), 1822. 6
[RH01] RAMAMOORTHI R., HANRAHAN P.: An efficient representation
for irradiance environment maps. In Proceedings of the 28th Annual
Conference on Computer Graphics and Interactive Techniques (2001),
SIGGRAPH ’01, p. 497–500. 6
[SBZ18] SHEKHA R S., BEIGPOUR S., ZI EGL ER M., C HWESIUK M.,
PALE N D., MYSZKOWSKI K., KE INERT J., MANTIUK R., DI DYK
P.: Light-field intrinsic dataset. In British Machine Vision Conference
(BMVC), Newcastle, UK, September 3-6 (2018), p. 120. 2,7
[SC09] SHEN H.-L., CAI Q.-Y.: Simple and efficient method for spec-
ularity removal in an image. Applied Optics 48, 14 (May 2009), 2711–
2719. 3,6,8
[SDSY17] SHI J., DONG Y., SUH., YUS . X.: Learning non-lambertian
object intrinsics across shapenet categories. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (2017), pp. 1685–
1694. 2
[Sha85] SHAFER S. A.: Using color to separate reflection components.
Color Research & Application 10, 4 (1985), 210–218. 2
[SHZ18] SANDLE R M., HOWARD A., ZHU M. , ZHMOGINOV A.,
CHEN L.: Mobilenetv2: Inverted residuals and linear bottlenecks. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018), pp. 4510–4520. 8
[SLM08] SHARAN L., LIY., MOTOYOSH I I., NISHI DA S., ADELSON
E. H.: Image statistics for surface reflectance perception. Journal of the
Optical Society of America A 25, 4 (Apr 2008), 846–865. 3
[TC13] TIAN Q., CLA RK J. J.: Real-time specularity detection using
unnormalized wiener entropy. In International Conference on Computer
and Robot Vision (2013), pp. 356–363. 4
[TFA05] TAPP EN M. F., F REEMAN W. T., ADELS ON E. H.: Recover-
ing intrinsic images from a single image. IEEE Transactions on Pattern
Analysis and Machine Intelligence 27, 9 (Sept. 2005), 1459–1472. 2,3
[TI05] TAN R. T., I KEU CHI K.: Separating reflection components of
textured surfaces using a single image. IEEE Transactions on Pattern
Analysis and Machine Intelligence 27, 2 (Feb. 2005), 178–193. 3
[TM98] TOMASI C., MANDUCHI R.: Bilateral filtering for gray and
color images. In Sixth International Conference on Computer Vision
(IEEE Cat. No.98CH36271) (1998), pp. 839–846. 5
[VAE19] VOYNOV O. , ARTE MOV A., EGIAZARIAN V., NOTC HENKO
A., BOBROVSKIKH G., BURNAEV E., ZO RIN D.: Perceptual deep
depth super-resolution. In IEEE International Conference on Computer
Vision (ICCV) (2019), pp. 5652–5662. 12
[VS08] VEDALDI A., SOATTO S.: Quick shift and kernel methods for
mode seeking. In European Conference on Computer Vision (ECCV)
(2008), pp. 705–718. 9
[WKO12] WINNEMÖLLER H., KYPRIANIDIS J. E., OLS EN S. C.:
Xdog: an extended difference-of-gaussians compendium including ad-
vanced image stylization. Computers & Graphics 36, 6 (2012), 740–753.
9
[WLYY17] WANG Y., LIK., YANG J., YEX.: Intrinsic decomposi-
tion from a single rgb-d image with sparse and non-local priors. In
IEEE International Conference on Multimedia and Expo (ICME) (2017),
pp. 1201–1206. 2
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition
[WOG06] WINNEMÖLLER H., OLS EN S. C., GOO CH B.: Real-time
video abstraction. ACM Transactions On Graphics (TOG) 25, 3 (2006),
1221–1226. 9
[YGL14] YEG., GARCES E., LIU Y., DAI Q ., GUTIERREZ D.: Intrin-
sic video and applications. ACM Transactions on Graphics 33, 4 (July
2014). 3
[YWA10] YANG Q., WANG S., AHUJA N.: Real-time specular highlight
removal using bilateral filtering. In European Conference on Computer
Vision (ECCV) (2010), pp. 87–100. 3,6,8
[ZKE15] ZHOU T., KRAHENBUHL P., EF ROS A. A.: Learning data-
driven reflectance priors for intrinsic image decomposition. In IEEE
International Conference on Computer Vision (ICCV) (2015), pp. 3469–
3477. 2,3
[ZTD12] ZHAO Q., TAN P., DAI Q., SH EN L., WUE., LIN S.: A
closed-form solution to retinex with nonlocal texture constraints. IEEE
Transactions on Pattern Analysis and Machine Intelligence 34, 7 (July
2012), 1437–1444. 2,3
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
EUROGRAPHICS 2021 / N. Mitra and I. Viola
(Guest Editors)
Volume 40 (2021), Number 2
Interactive Photo Editing on Smartphones
via Intrinsic Decomposition
(Supplementary Material)
Sumit Shekhar1, Max Reimann1, Maximilian Mayer1,2, Amir Semmo1,2,
Sebastian Pasewaldt1,2, Jürgen Döllner1, and Matthias Trapp1
1Hasso Plattner Institute for Digital Engineering, University of Potsdam, Germany
2Digital Masterpieces GmbH, Germany
1. Ablation Study and Parameter Analysis
In this section, we demonstrate the efficiency of the intermediate steps and different parameter settings to achieve better results in terms of
both specularity removal and intrinsic decomposition.
1.1. Specularity Removal
Our specularity removal approach described in Section 3.1 of the main paper consists of three steps namely,
1. identification of specularity,
2. intensity reduction of specular pixels, and
3. chroma inpainting of specular pixels.
The first step makes use of a single parameter to identify specular pixels, the threshold value τ(Section 3.1.1 of the main paper). In Fig. 1,
we show how the specular mask changes with increasing value of τ. For the next two steps, only the pixels within the specular mask are
processed to remove the inherent specularity. In order to address high- and low- frequency specularity separately, we scale down the positive
coefficients of high-frequency and low-frequency intensity sub-bands using parameters κhand κlrespectively (Section 3.1.2 of the main
paper). The above intensity reduction step exposes the missing chroma information for saturated specular pixels, see Fig. 2(d). We fill in the
missing chroma information by iterative bilateral filtering (Section 3.1.3 of the main paper). The initial chromaticity image with the missing
information in specular pixels is considered as C0, and after k+1 iteration the modified image is given as
Ck+1(p
p
p) = 1
Wp
q
q
qM(p
p
p)
Gσs(||p
p
pq
q
q||)Gσr(||Ck(p
p
p)Ck(q
q
q)||)Ck(q
q
q),(1)
where the normalization factor Wpis computed as:
Wp=
q
q
qM(p
p
p)
Gσs(||p
p
pq
q
q||)Gσr(||Ck(p
p
p)Ck(q
q
q)||).(2)
The degree of inpainting in each iteration is controlled by parameters σsand σrfor image Ckto get a seamless diffuse output Fig. 2(e).
The effect of chroma inpainting with varying values of σsand σrare depicted in Fig. 3and Fig. 4respectively. To further motivate the
intermediate steps, we use input and output images for both chroma and intensity in Fig. 5. Our filtering pipeline allows us to individually
remove high frequency and/or low frequency specularities and can also be used to enhance it (refer to Fig. 6).
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John
Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
Input τ=0.04 τ=0.12 τ=0.24
Figure 1: The specular mask for the given input image with the increasing value of threshold parameter τ.
(a) Input (b) Only Low Freq. (c) Only High Freq. (d) Both (e) w. Inpainting
Figure 2: Diffuse outputs when we only reduce positive coefficients of (b) low frequency (0.5κl0.2) and (c) high frequency (0.5
κh0.2) intensity subbands individually. The removal of both high and low frequency coefficients exposes the missing chroma information
(d) which is then inpainted to get a seamless output (e).
Initial σs=1.5σs=3.0σs=7.25
ChromaDiffuse
Figure 3: The inpainted chroma and corresponding diffuse image output for different values of σs, for all the above cases we use σr=3.0
and a kernel size of 10 ×10 pixels.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
Initial σr=0.1σr=0.2σr=0.4
ChromaDiffuse
Figure 4: The inpainted chroma and corresponding diffuse image output for different values of σr, for all the above cases we use σs=7.25
and a kernel size of 10 ×10 pixels.
Input Diffuse Chroma Input Chroma Output Intensity Input Intensity Output
Figure 5: Intermediate results of our diffuse-image generation pipeline. Note how the output—intensity and chroma—images are modified
as compared to the input, please zoom-in for details.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
(a) Input (b) Diffuse (c) Diffuse Low Freq. (d) Diffuse High Freq. (e) Specular Enhanced
Figure 6: To obtain the diffuse image (b), we remove both high and low frequency specularitites (0.5κh,κl0.2). One can remove only
low frequency specularities (0.5κl0.2) or only high frequency specularities (0.5κh0.2) to obtain (c) and (d) respectively. The
specularities can also be seamlessly enhanced using our pipeline (κh,κl>1.0) to obtain (e).
The list of images (and their sources) used to demonstrate our specularity removal method:
Face - https://cdn.repeller.com/wp-content/uploads/2020/02/YLM_THE_POUF_PC- 848x1272.jpg
Peas - https://img1.goodfon.com/wallpaper/nbig/3/19/zelenyy-goroshek- goroshiny.jpg
Paprika - https://p0.pikist.com/photos/124/344/sweet-peppers- paprika-green-yellow- red-healthy-vitamins- pepper-vegetables.
jpg
Pebbles - https://c.pxhere.com/photos/ab/89/rocks_stones_colorful_colourful_pebbles-641429.jpg!d
Toy Soldiers - https://2.bp.blogspot.com/-YvcgGOcL404/V1YLs-_LZFI/AAAAAAAACFA/xL2rEHvA9zI9EGKcTgqz7MVp4tgmvvJnwCLcB/
s1600/lgam%2B1.jpg
1.2. Intrinsic Decomposition
For intrinsic decomposition, we solve for both albedo (a) and shading (s) simultaneously by minimizing the below energy function (Section
3.2 of the main paper),
E(x
x
x) = 1
2 λraEra (x
x
x)
| {z }
E1
+λrsErs (x
x
x) + λdEd(x
x
x)
| {z }
E2
!+λsp ||a(x
x
x)||1
| {z }
E3
(3)
where λraEra ,λrsErs , and λdEdare retinex-albedo smoothness, retinex-shading smoothness, and data terms respectively with their corre-
sponding weights. We use a L1regularizer to enforce sparsity in the resulting albedo controlled by the weight λsp . We can further classify the
energy in three parts: (i)E1– responsible for edge-preserving smoothing of initial albedo, (ii)E2– is the contribution of shading smoothing
and its coupling with albedo optimization and (iii)E3– that controls the sparsity in the resulting albedo, as depicted in Eqn. (3). For the
ablation study, we build our energy one component at a time and analyze how it improves the final output, see Fig. 7. Further, to understand
the effect of individual energy term on the final result, we vary their weight parameter and analyze how it affects the output albedo or shad-
ing. For the given input image, we get the best results with the following parameter values: for energy formulation – λra =5.0, αra =10.0,
λrs =50.0, αrs =100.0, λd=0.02, and for optimization solver – α=0.0025, β=0.01, λsp =0.14 respectively. We then vary the weights
of individual energy terms and analyze the output albedo/shading as shown in Fig. 8, Fig. 9, Fig. 10, and Fig. 11.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
(a) Input (b) Only E1(c) E1+E2(d) E1+E2+E3
Figure 7: Input image and corresponding albedo as we add different components to the energy formulation. Note, how the fine details in the
resulting albedo becomes more visible with each component.
(a) Input (b) λra =2.32 (c) λra =15.0 (d) λra =30.63
Figure 8: Input image and corresponding albedo output with increasing value of threshold λra . The “Retinex-Albedo smoothness” term
smooths the output albedo while preserving the edges. However, if the smoothing is more it starts blurring important details, see (d).
(a) Input (b) λrs =16.12 (c) λr s =50.0 (d) λrs =90.83
Figure 9: Input image and corresponding shading output with increasing value of threshold λrs. The “Retinex-Shading smoothness” term
smooths the output shading while preserving the edges. However, if the smoothing is more it starts blurring important details, see (d).
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
(a) Input (b) λd=0.005 (c) λd=0.024 (d) λd=0.046
Figure 10: Input image and corresponding albedo with increasing value of threshold λd. The “Data” term couples the separate optimization
solvers of albedo and shading. However, if this coupling is too strong it introduces shading details in the output albedo, see (d).
(a) Input (b) λrs =0.045 (c) λrs =0.145 (d) λrs =0.222
Figure 11: Input image and corresponding albedo with increasing value of threshold λs p. The “L1Sparsity” term makes the output albedo
more sparse however, an excess of sparsity leads to spatially inconsistent patch-like artifacts in the output albedo, see (d).
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
2. Material Appearance Editing – More Results
Input Silk Tattoo Hulk Glass
Figure 12: Showcasing more results for material editing. Please zoom in for better visualization of tattoo results.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
3. Atmospheric Appearance Editing – More Results
Input Low-density Fog High-density Fog
Figure 13: Showcasing more results for virtual fog.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
Input God rays God rays + Fog
Figure 14: Showcasing more results for God rays.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
4. Image Stylization – More Results
(a) Input (b) Albedo (c) Quickshift (d) Halftoning
(e) Input (f) Luminance-based cartoon (g) Depth edges + quantized shading (h) Depth edges + smoothed shading
(i) Input (j) Dark edges (k) Quickshift (l) Depth tangent-flow
Figure 15: More results for full stylization pipeline. The first row demonstrates variants of using decomposition information, by (b) only
using albedo, (c) adding segmented shading using quickshift [VS08] and (d) halftoning the shading. The second row compares the color
luminance-based cartoon filter by Winnemöller et al. [WOG06,WKO12](f) with depth-edge enhanced cartoon (g) &(h), the depth edges are
especially noticeable on the glass. Also, (g) &(h) show variants of albedo & shading recombination for a posterize effect and an airbrush-like
effect. The third row demonstrates stylization in outdoor scenes without a foreground subject. Increasing the blackness and edge thickness
creates a dark/moody effect (j) and using quickshift shading increases the fogginess in the image (k). While the captured depth accuracy is
limited, it can be used to create abstract effects such as visualizing depth (l) by increasing the depth-normal boost factor ωin STφ.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
(a) Input Image (b) Cartoon - luminance edges only
(c) Cartoon - depth edges only (d) Cartoon - combined edges
Figure 16: Luminance-based edges and depth edges are combined to reduce obstructions (as visible in the face) and keep thick outlines
around actual borders
(a) Cartoon - raymarched full shadow (b) Cartoon - raymarched half shadow (c) Cartoon - relighted full shadow
Figure 17: Raymarched shadows can be used to shadow (a),(b) or relight (c) the foreground subject.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
(a) Cartoon - no shadow (b) Cartoon - hard shadow (c) Cartoon - halftoned shadows
Figure 18: Normal-angle based hard shadows can be used for scene emphasis (b), and can be combined with halftoning to create a “noir”
cartoon style (c).
(a) Input (b) Very fine (σ=14,τ=2.0) (c) Fine (σ=14,τ=4.0)
(d) Medium (σ=15.0,τ=6.0) (e) Coarse (σ=18.0,τ=8.0) (f) Very Coarse (σ=20.0,τ=10.0)
Figure 19: Different levels of coarseness in quickshift shading. Here, σcontrols the kernel size of the Gaussian distribution, while τis the
distance threshold used when comparing neighboring pixels and affects the size of the kernel window. For parameter details, please refer to
Vedaldi et al. [VS08].
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
5. Intrinsic Decomposition – More Results
Input Shading (Ours) Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18]
Figure 20: Comparison of intrinsic decomposition with other methods. The figure contains input image and the corresponding shading (for
Fig.12 in main paper) obtained using ours, Bell et al. [BBS14], Jeon et al. [JCTL14], and Lettry et al. [LVVG18] intrinsic decomposition
methods.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
Input Albedo (Ours) Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18]
Figure 21: Comparison of intrinsic decomposition albedo with other methods for outdoor scenes. The figure contains input image and the
corresponding albedo obtained using ours, Bell et al. [BBS14], Jeon et al. [JCTL14], and Lettry et al. [LVVG18] intrinsic decomposition
methods.
Input Shading (Ours) Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18]
Figure 22: Comparison of intrinsic decomposition shading with other methods for outdoor scenes. The figure contains input image and the
corresponding shading obtained using ours, Bell et al. [BBS14], Jeon et al. [JCTL14], and Lettry et al. [LVVG18] intrinsic decomposition
methods.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
6. Specularity Removal – More Results
Input Ground Truth Ours Fu et al. [FZS19] Akashi et al. [AO16]
Figure 23: Comparison of specularity removal for images captured in a lab setting. The figure contains input images, its respective ground
truth diffuse version, and the corresponding diffuse images obtained using ours, Fu et al. [FZS19], and Akashi et al. [AO16] specularity
removal methods.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
7. Image Editing
In this section we show image three different type of image-editing results based on intrinsic decomposition. For ground truth comparison
we make use of the intrinsic image dataset of Bonneel et al. [BKPB17]. In order to introduce image edits using intrinsic layers we make use
of the appearance-editing framework from Beigpour et al. [BSM18].
As the first edit we Recolor the below image using intrinsic layers obtained from competing methods and compare that with the ground
truth,
Figure 24: Cream Box
Ground Truth Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18] Ours
Albedo
Shading
Edited
Figure 25: The computed intrinsic layers and corresponding edited image. Note that the recolored image looks quite different for the method
of Lettry et al. due to inaccuracies in the computed albedo.
For the next edit we perform Material-Editing for the below image,
Figure 26: Human Face
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
Ground Truth Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18] Ours
Albedo
Shading
Edited
Figure 27: For this edit, we try to give the skin a shiny silver-metal like look using the intrinsic layers. For the method of Lettry et al. the
shading is multi-channel and also contains details about skin texture and tone thus preventing a plausible edit.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
For the next edit we perform Retexturing for the below image,
Figure 28: Corridor
Ground Truth Bell et al. [BBS14] Jeon et al. [JCTL14] Lettry et al. [LVVG18] Ours
Albedo
Shading
Edited
Figure 29: As part of this edit we retexture the floor of the corridor by compositing the image of a carpet in the albedo intrinsic layer. It is
a challenging edit due to both shading and albedo variations in the given region. None of the method achieves a flawless result, however
except the method of Lettry et al. the others are able to preserve shading variations in the output.
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
S. Shekhar et al. / Interactive Photo Editing on Smartphones via Intrinsic Decomposition(Supplementary Material)
References
[AO16] AKASHI Y., OKATANI T.: Separation of reflection components by sparse non-negative matrix factorization. Computer Vision and Image
Understanding 146, C (May 2016), 77–85. doi:10.1016/j.cviu.2015.09.001.15
[BBS14] BELL S., BALA K., SNAVELY N.: Intrinsic images in the wild. ACM Transactions on Graphics 33, 4 (July 2014). doi:10.1145/2601097.
2601206.13,14,16,17,18
[BKPB17] BONNEE L N., KOVACS B., PARIS S., BALA K.: Intrinsic decompositions for image editing. Computer Graphics Forum 36, 2 (May 2017),
593–609. doi:10.1111/cgf.13149.16
[BSM18] BEIGPOUR S., SHE KHA R S., MA NSO URYAR M., M YSZ KOWSKI K., SEIDEL H.-P.: Light-field appearance editing based on intrinsic decom-
position. Journal of Perceptual Imaging 1, 1 (2018), 15. doi:10.2352/J.Percept.Imaging.2018.1.1.010502.16
[FZS19] FUG., ZHA NG Q., SONG C ., LI N Q., XIAO C.: Specular highlight removal for real-world images. Computer Graphics Forum 38, 7 (2019),
253–263. doi:10.1111/cgf.13834.15
[JCTL14] JEON J., CHO S., TONG X., LEE S.: Intrinsic image decomposition using structure-texture separation and surface normals. In European
Conference on Computer Vision (ECCV) (2014), pp. 218–233. doi:10.1007/978- 3-319-10584- 0_15.13,14,16,17,18
[LVVG18] LET TRY L., VA NHO EY K., VAN GO OL L.: Unsupervised deep single-image intrinsic decomposition using illumination-varying image se-
quences. Computer Graphics Forum 37, 7 (2018), 409–419. doi:10.1111/cgf.13578.13,14,16,17,18
[VS08] VEDALDI A., SOATTO S.: Quick shift and kernel methods for mode seeking. In European Conference on Computer Vision (ECCV) (2008),
pp. 705–718. doi:10.1007/978- 3-540-88693-8_52.10,12
[WKO12] WINNEMÖLLER H., KYPRIANIDIS J. E., OLS EN S. C.: Xdog: an extended difference-of-gaussians compendium including advanced image
stylization. Computers & Graphics 36, 6 (2012), 740–753. doi:10.1016/j.cag.2012.03.004.10
[WOG06] WINNEMÖLLER H. , OLS EN S. C., GOO CH B.: Real-time video abstraction. ACM Transactions On Graphics (TOG) 25, 3 (2006), 1221–1226.
10
c
2021 The Author(s)
Computer Graphics Forum c
2021 The Eurographics Association and John Wiley & Sons Ltd.
... The above mobilebased methods consider only RGB data as input. Recently, Shekhar et al. [13] demonstrate depth-based stylization methods running interactively on a mobile device. As part of our approach we employ depth data only for 3D photo generation and as future work would also incorporate it for stylization. ...
... With the coverage data c (c.f. Fig. 7b), we make use of Bilateral Filter for inpainting similar to Shekhar et al. [13]. The filter is applied on a per-layer basis for the image regions that are qualified with c(x, y) = 1 and within a specified distance to the visible pixels. ...
... With a combination of depth-and semantic-estimation, the resulting depth map can be further improved to avoid objects being split to different layers [23]. Further, depth-map upsampling can be improved using guided filtering [24] and can form the basis to implement stylized atmospheric effects [13]. ...
Conference Paper
Full-text available
For decades, Image-based Artistic Rendering (IB-AR) has been successfully employed to simulate the appeal of traditional artistic styles for enhanced visual communication. Recently, 3D photography has emerged as a new medium that provides an immersive dimension compared to 2D photos. The possibility to change the viewpoint, depicting parallax effects is mesmerizing. We present Trios, an interactive mobile app that combines the vividness of IB-AR with immersive 3D photos. Trios implements an end-to-end pipeline for the acquisition of data, generation of a 3D photo in the form of a Layered Depth Image (LDI), and its artistic rendering. The app allows the user to either capture the input data or load existing data from the device. As part of the generation step, users can set the number of layers used for representation of 3D photos. Finally, with different artistic filters and their parameterization users can stylize either an individual semantic layer or all layers simultaneously. The complete pipeline runs at interactive frame rates and the final output is obtained as a compact video, which can easily be shared. Thus, it serves as a unique interactive tool for digital artists interested in creating immersive artistic content.
... Selain itu, smartphone menyediakan aplikasi untuk editing foto yang jumlahnya lebih banyak dibandingan dengan kamera digital. Aplikasi ini dapat digunakan untuk mengedit foto yang dihasilkan oleh kamera smartphone, salah satu teknik editing yang digunakan adalah aplikasi Canva [4]. ...
Article
Wabah virus corona 19 telah membawa dampak yang besar dalam bidang ekonomi hingga ke pelosok desa di seluruh Indonesia, salah satunya Desa Pengkok Kecamatan Patuk Kabupaten Gunungkidul Propinsi Daerah Istimewa Yogyakarta. Bagi rakyat yang bekerja di sektor informal tentu sangat berat dampak yang dirasakannya. Beberapa yang memiliki kemampuan lebih dapat membuat produk dalam ruang lingkup ekonomi kreatif, bagi yang tidak butuh peningkatan sumber daya manusia sehingga dapat membuat produk. Produk yang dihasilkan tentu harus segera dipasarkan. Cara pemasaran menggunakan media sosial sangat efektif dan efisien bagi mereka. Hanya kemampuan dalam membuat konten yang mengisi media sosial tersebut perlu dtingkatkan melalui pendampingan pelatihan editing video menggunakan aplikasi canva yang dapat dengan mudah diunduh melalui smartphone yang dimilikinya. Para pelaku usaha ekonomi kreatif di Desa Pengkok Patuk Gunungkidul dapat mengetahui dan memahami editing konten produk menggunakan Canva. 71,67% pelaku usaha ekonomi kreatif di Desa Pengkok Patuk Gunungkidul dapat melakukan editing konten produk menggunakan Canva. Secara keseluruhan prosentase kepuasan peserta pelatihan editing konten produk menggunakan Canva sebesar 81,48%
... It is of great significance for computer vision and graphics to automatically restore the intrinsic information of objects and shooting scenes through a single image. For instance, albedo can improve the performance of semantic segmentation [2], spoofing in face recognition [3], facial intrinsic analysis [4], and photo editing [5]. Shading can help with shape-from-shading [6], relighting [7], and 3D reconstruction [8]. ...
Article
Single image-based intrinsic image decomposition attempts to separate one input image into several intrinsic components, which is inherently an under-constrained problem. Some recent works have been proposed to estimate the intrinsic components using encoder-decoder structures. However, they generally lack exploration of the different component-oriented feature constraints and feature selection processes. In this paper, a non-local color compensation network (NCCNet) is proposed. Firstly, the hue and value channels of HSV color space are used as the complementary information for RGB images for the estimation of albedo and shading, respectively. The color space representation serves as an external constraint, which does not require expensive sensors or complicated computations. Secondly, an integrated non-local attention scheme is proposed to describe the relations of non-adjacent regions with a lower computational complexity compared to traditional methods. Then the non-local and local attention are combined to describe correlations among features and used as feature selectors between the encoder and decoder. Thirdly, the mutual constraint between albedo and shading is also explored in the network to further optimize the process. In order to train the network, a unified mutual exclusion loss function is proposed. Extensive experiments are conducted on several popular datasets, and the proposed NCCNet achieves improved performance with comparable computational cost compared to competing methods.
Article
Removing specular highlight in an image is a fundamental research problem in computer vision and computer graphics. While various methods have been proposed, they typically do not work well for real‐world images due to the presence of rich textures, complex materials, hard shadows, occlusions and color illumination, etc. In this paper, we present a novel specular highlight removal method for real‐world images. Our approach is based on two observations of the real‐world images: (i) the specular highlight is often small in size and sparse in distribution; (ii) the remaining diffuse image can be represented by linear combination of a small number of basis colors with the sparse encoding coefficients. Based on the two observations, we design an optimization framework for simultaneously estimating the diffuse and specular highlight images from a single image. Specifically, we recover the diffuse components of those regions with specular highlight by encouraging the encoding coefficients sparseness using L0 norm. Moreover, the encoding coefficients and specular highlight are also subject to the non‐negativity according to the additive color mixing theory and the illumination definition, respectively. Extensive experiments have been performed on a variety of images to validate the effectiveness of the proposed method and its superiority over the previous methods.
Conference Paper
Reconstructing shape and reflectance properties from images is a highly under-constrained problem, and has previously been addressed by using specialized hardware to capture calibrated data or by assuming known (or highly constrained) shape or reflectance. In contrast, we demonstrate that we can recover non-Lambertian, spatially-varying BRDFs and complex geometry belonging to any arbitrary shape class, from a single RGB image captured under a combination of unknown environment illumination and flash lighting. We achieve this by training a deep neural network to regress shape and reflectance from the image. Our network is able to address this problem because of three novel contributions: first, we build a large-scale dataset of procedurally generated shapes and real-world complex SVBRDFs that approximate real world appearance well. Second, single image inverse rendering requires reasoning at multiple scales, and we propose a cascade network structure that allows this in a tractable manner. Finally, we incorporate an in-network rendering layer that aids the reconstruction task by handling global illumination effects that are important for real-world scenes. Together, these contributions allow us to tackle the entire inverse rendering problem in a holistic manner and produce state-of-the-art results on both synthetic and real data.
Article
Machine learning based Single Image Intrinsic Decomposition (SIID) methods decompose a captured scene into its albedo and shading images by using the knowledge of a large set of known and realistic ground truth decompositions. Collecting and annotating such a dataset is an approach that cannot scale to sufficient variety and realism. We free ourselves from this limitation by training on unannotated images. Our method leverages the observation that two images of the same scene but with different lighting provide useful information on their intrinsic properties: by definition, albedo is invariant to lighting conditions, and cross‐combining the estimated albedo of a first image with the estimated shading of a second one should lead back to the second one's input image. We transcribe this relationship into a siamese training scheme for a deep convolutional neural network that decomposes a single image into albedo and shading. The siamese setting allows us to introduce a new loss function including such cross‐combinations, and to train solely on (time‐lapse) images, discarding the need for any ground truth annotations. As a result, our method has the good properties of i) taking advantage of the time‐varying information of image sequences in the (pre‐computed) training step, ii) not requiring ground truth data to train on, and iii) being able to decompose single images of unseen scenes at runtime. To demonstrate and evaluate our work, we additionally propose a new rendered dataset containing illumination‐varying scenes and a set of quantitative metrics to evaluate SIID algorithms. Despite its unsupervised nature, our results compete with state of the art methods, including supervised and non data‐driven methods.