Conference PaperPDF Available

Multiscopic HDR Image sequence generation

  • CYENS Research Center on Interactive Media Smart Systems and Emerging Technologies

Abstract and Figures

Creating High Dynamic Range (HDR) images of static scenes by combining several Low Dynamic Range (LDR) images is a common procedure nowadays. However, 3D HDR video acquisition hardware barely exist. Limitations in acquisition, processing, and display make it an active, unsolved research topic. This work analyzes the latest advances in 3D HDR imaging and proposes a method to build multiscopic HDR images from LDR multi-exposure images. Our method is based on a patch match algorithm which has been adapted and improved to take advantage of epipolar geometry constraints of stereo images. Up to our knowledge, it is the first time that an approach different than traditional stereo matching is used to obtain accurate matching between the stereo images. Experimental results show accurate registration and HDR generation for each LDR view.
Content may be subject to copyright.
Multiscopic HDR Image sequence generation
Raissel Ramirez Orozco
Group of Geometry and
Universitat de Girona
(UdG), Spain
Céline Loscos
Université de Reims
(URCA), France
Ignacio Martin
Group of Geometry and
Universitat de Girona
(UdG), Spain
Alessandro Artusi
Graphics & Imaging
Universitat de Girona
(UdG), Spain
Creating High Dynamic Range (HDR) images of static scenes by combining several Low Dynamic Range (LDR)
images is a common procedure nowadays. However, 3D HDR video acquisition hardware barely exist. Limitations
in acquisition, processing, and display make it an active, unsolved research topic. This work analyzes the latest
advances in 3D HDR imaging and proposes a method to build multiscopic HDR images from LDR multi-exposure
images. Our method is based on a patch match algorithm which has been adapted and improved to take advantage
of epipolar geometry constraints of stereo images. Up to our knowledge, it is the first time that an approach different
than traditional stereo matching is used to obtain accurate matching between the stereo images. Experimental
results show accurate registration and HDR generation for each LDR view.
High Dynamic Range, Stereoscopic HDR, Stereo Matching, Image Deghosting
High Dynamic Range (HDR) imaging is an increasing
area of interest at academic and industrial level, and one
of its crucial aspects is the reliable and easy content cre-
ation with existing digital camera hardware.
Digital cameras with the ability to capture extended dy-
namic range, are appearing into the consumer market.
They either use a sensor capable to capture an intensity
range larger than the one captured by traditional 8-10
bit sensors, or integrate higher bit acquisition sensors
and software to largely increase the acquired intensity
range. However, due to their high costs, their use is very
limited [? ].
Traditional low dynamic range (LDR) camera sensors
provide an auto-exposure feature that can be used to
increase the dynamic range of light captured from the
scene. The main idea is to capture the same scene at
different exposure levels, and then combine them to re-
construct the full dynamic range.
To achieve this, different approaches have been pre-
sented [MP95, DM97, RBS99, MN99, RBS03], but
they are not exempt of drawbacks. Ghosting effects
may appear in the reconstructed HDR image, when the
Permission to make digital or hard copies of all or part of
this work for personal or classroom use is granted without
fee provided that copies are not made or distributed for profit
or commercial advantage and that copies bear this notice and
the full citation on the first page. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.
pixels in the source images are not perfectly aligned.
This is due to two main reasons: either camera move-
ment or objects movement in the scene. Several solu-
tions for general image alignment exist and for an ex-
tensive presentation see Zitová [ZF03]. However, it is
not straightforward to consider such methods because
exposures in the images sequence are different, making
alignment a difficult problem.
High Dynamic Range content creation is lately mov-
ing from the 2D to 3D imaging domain introducing a
series of open problems that need to be solved. 3D im-
ages are displayed in two main different ways: either
from two views for monoscopic displays with glasses
or from multiple views for auto-stereoscopic displays.
Most of current auto-stereoscopic displays accept from
five to nine different views [LLR13]. To our knowl-
edge, HDR auto-stereoscopic displays do not exist yet.
We can feed LDR auto-stereoscopic displays with tone-
mapped HDR, but we will need at least five different
Some of the techniques used for 2D applications have
been recently extended to be used for multiscopic
images [TKS06, LC09, SMW10, Ble, BLV
14, SDBRC14]. However, most of
these solutions suffer from a common limitation, they
need to rely on accurate dense stereo matching between
images which may fail in case of different brightness
between exposures [BVNL14].
Thus, more robust and faster solution for matching dif-
ferent exposure images that allows an easy and reli-
(a) Non aligned (b) Bätz et al. [BRG
14] (c) Our Result
Figure 1: Set of LDR multi-view images from the IIS Jumble data-set, courtesy of Bätz [BRG
14]. The top row shows five
different stereo exposures. The bottom row shows HDR images obtained without alignment (a), using Bätz’s method (b) and
using our proposed patch-match method (c).
able acquisition of multiscopic HDR content is highly
In response to this need, in this paper we propose a
solution to combine sets of multiscopic LDR images
into HDR content using image correspondences based
on the Patch Match algorithm [BSFG09]. This algo-
rithm has been used recently by Sen et al. [SKY
12] to
build HDR images that are free of ghosting effects. The
needs of improving the coherence of neighbour patches
was already presented in [? ].The results were promis-
ing for multi-exposure sequences where the reference
image is moderately under exposed or saturated but it
fails when the reference image has large under exposed
or saturated areas.
We propose to adapt this approach for multiscopic im-
age sequences (Figure 1), that answer to a simplified
epipolar geometry obtained by parallel optical axes (im-
ages not originally taken with this geometric configu-
ration can be later rectified). In particular, we reduce
the search space in the matching process and improv-
ing the incoherence problem of the patch-match. Each
image in the set of multi-exposed images is used as a
reference, we look for matches in all the remaining im-
ages. These accurate matches allow to synthesize im-
ages corresponding to each view which are merged into
one HDR per view.
Our contributions into the field can be summarized as
We provide an efficient solution to multiscopic HDR
images generation.
Traditional stereo matching presents several draw-
backs when directly applied on images with differ-
ent exposures. We introduce the use of an improved
version of patch-match to solve these drawbacks.
Patch-match algorithm was adapted to take advan-
tage of the epipolar geometry reducing its computa-
tional costs while improves its matching coherence
Two main areas were considered in this work. The
following section presents the main state of the art re-
lated to stereo HDR acquisition and multi-exposed im-
age alignment for HDR generation.
2.1 Stereo HDR Acquisition
Some prototypes have been proposed to acquire stereo
HDR content from multi-exposure views. Most ap-
proaches [TKS06, LC09, SMW10, Ruf11, BRG
AKCG14] are based on a rig of two cameras placed like
a conventional stereo configuration that captures dif-
ferently exposed images. Troccoli et al. [TKS06] pro-
posed to use cross correlation stereo matching to get a
primary disparity match. The correspondences are used
to calculate the camera response function (CRF) to con-
vert pixel values to radiance space. Stereo matching is
executed again but now in radiance space to extract the
depth maps.
Lin and Chang [LC09] use SIFT descriptors to find cor-
respondences. The best correspondences are selected
using epipolar constrains and used to calculate the CRF.
The stereo matching algorithm is based on belief prop-
agation to derive the disparity map. A ghost removal
technique is used to avoid artifacts due to noise or stereo
mismatches. Even though, disparity maps are not accu-
rate in large areas that are under exposed or saturated.
Rüfenacht[Ruf11] compares two different approaches
to obtain stereoscopic HDR video content: a temporal
approach, where exposures are captured by temporally
changing the exposure time of two synchronized cam-
eras to get two frames of the same exposure per shot,
and a spatial approach, where cameras have different
exposure time for all the shots so the two frames of the
same shot are exposed differently.
Bonnard et al. [BLV
12] propose a methodology
to create content that combines depth (3D) and HDR
video for auto-stereoscopic displays. They used recon-
structed depth information from epipolar geometry to
drive the pixel match procedure. The matching method
lacks of robustness especially on under exposed or sat-
urated areas. Akhavan et al. [AYG13, AKCG14] offer
a useful comparison of the difference between dispar-
ity maps obtained from HDR, LDR and tone-mapped
Selmanovic et al. [SDBRC14] propose to generate
Stereo HDR video from a pair HDR-LDR, using an
HDR camera and a traditional digital camera. In this
case, one HDR view needs to be reconstructed. Three
methods were proposed to generate the HDR: (1) to
warp the existing one using a disparity map, (2) to in-
crease the range of the LDR view using an expansion
operator and (3) an hybrid of the two methods which
provides the best results.
Bätz et al. [BRG
14] present a framework with two
LDR cameras, the input images are rectified before the
disparity estimation. Their stereo matching exposure
invariant use Zero-Mean Normalized Cross Correlation
(ZNCC) as a matching cost. The matching is performed
on the gray-scale radiance space image followed by a
local optimization and disparities refinement. Some ar-
tifacts may persist in the saturated areas.
2.2 Multi-exposed Image Alignment
In the HDR context, most of methods on image
alignment focus on movement between images caused
by hand-held capture, small movement of tripods or
matching moving pixels from dynamic objects in the
scene. One of the main drawbacks for HDR video
acquisition is the lack of robust algorithms for deghost-
ing. Both Hadziabdic et al. [HTM13] and Srikantha
et al. [SS12] provide good reviews and comparisons
between recent methods.
Kang et al. [KUWS03] proposed to capture video se-
quences alternating long and short exposure times. Ad-
jacent frames are warped and registered to finally gen-
erate an HDR frame. Sand and Teller [ST04] combine
feature matching and optical flow for spatio-temporal
alignment of different exposed videos. They search for
frames that best match using locally weighted regres-
sion to interpolate and extrapolate image correspon-
dences. This method is robust to changes in exposure
and lighting, but it is slow and artifacts may appear if
there are objects moving at high speed.
Mangiat and Gibson [MG10] propose to use a method
of block-based motion estimation and refine the mo-
tion vectors in saturated regions using color similarity
in the adjacent frames of an alternating multi-exposed
Sun et al. [SMW10] assume that the disparity map be-
tween two rectified images can be modeled as a Markov
random field. The matching problem is then posed as a
Bayesian labeling problem in which the optimal values
are obtained minimizing an energy function. The en-
ergy function is composed of a pixel dissimilarity term
(using NCC as similarity measure) and a smoothness
term which corresponds to the MRF likelihood and the
MRF prior, respectively.
Sen et al. [SKY
12] recently presented a method based
on a patch-based energy-minimization formulation that
integrates alignment and reconstruction in a joint opti-
mization. This allows to produce an HDR result that is
aligned to one of the exposures and contains informa-
tion from all the rest. Artifacts may appear when there
are large under exposed or saturated areas in the refer-
ence image.
2.3 Discussion
Stereo matching is a mature research field; very accu-
rate algorithms are available for images taken under the
same lighting conditions and exposure. However, most
of such algorithms are not accurate for images with im-
portant lighting variations. We propose a novel frame-
work inspired by Barnes et al. [BSFG09] and Sen et
al. [SKY
12]. We adapt the matching process to the
multiscopic context resulting in a more robust solution.
Our method takes as input a sequence of LDR images
(RAW or not). We transform the input images to ra-
diance space, all the rest of steps are performed using
radiance space values instead of RGB pixels. For 8-
bits LDR images a CRF per camera needs to be esti-
mated. An overview of our framework is shown in the
diagram of the Figure 2. The first step is to recover the
correspondences between the n images of the set. We
propose to use a nearest neighbor search algorithm (see
section 3.1) instead of a full stereo matching approach.
Each image acts like a reference for the matching pro-
cess. The output of this step is n-1 warped images for
each exposure. Once we have all the matches, they are
combined into the output HDR image through a second
step (see section 3.2).
3.1 Nearest Neighbor Search
For a pair of images I
and I
, we compute a Near-
est Neighbor Field (NNF) from I
to I
using an im-
proved version of the method presented by Barnes et al.
Figure 2: Proposed framework for multiscopic HDR Generation. It is composed by three main steps: (1) radiance space
conversion, (2) patch match correspondences search and (3) HDR generation
[BSFG09]. NNF is defined over patches around every
pixel coordinates in image I
for some cost function D
between two patches of images I
and I
. Given a patch
coordinate r 2 I
and its corresponding nearest neighbor
s 2 I
, NNF (r) = s. The values of NNF for all coordi-
nates are stored in an array with the same dimensions
of I
We start initializing the NNFs using random transfor-
mation values within a maximal disparity range in the
same epipolar line. Consequently the NNF is improved
by minimizing D until convergence or a maximum
number of iterations is reached. Two candidate sets are
used in the search phase as suggested by [BSFG09]: .
(1) Propagation uses the known adjacent nearest neigh-
bor patches to improve NNF, it converges fast but it may
fall in a local minima.
(2) Random search introduces a second set of random
candidates that are used to avoid local minima. For each
patch centered in pixel v
, the candidates u
are sampled
at an exponentially decreasing distance from v
= v
+ wa
where R
is a uniform random value 2 [-1,1], w is the
maximum value for disparity search and a is a fixed
ratio (1/2 is suggested).
Taking advantage of the epipolar geometry both
search accuracy and computational performances are
improved. Geometrically calibrated images allows to
reduce the search space from 2D to 1D domain, and
consequently reducing the search domain. As example,
using random search we only look for matches in the
range of maximal disparity in the same epipolar line
(1D domain), avoiding to search on 2D space. This
reduces significantly the number of samples to find a
valid match.
Typical drawback of the original NNFs approach
[BSFG09], used in the patch match algorithm, is the
non geometrically coherency of its search results. This
problem is illustrated in Figures 3 and 4. Two static
neighbor pixels, in the reference image, match two
separated pixels in the source image (Figure 3).
To overcome this drawback we propose a new distance
cost function D by incorporating a coherence term to
Figure 3: Patches from the reference image (Up) look for
their NN in the source image (Down). Even when destination
patches are similar in terms of color, matches may be wrong
because of geometric coherency problems.
penalize matches that are not coherent with the transfor-
mation of their neighbors. Both Barnes et al. [BSFG09]
and Sen et al. use the Sum of Squared Differences
(SSD), described in equation 3 where T represents the
transformation between patches of N pixels in images
and I
. We propose to penalize matches with trans-
formations that differ significantly form it neighbors
by adding the coherence term C defined in equation 4.
The variable d
represents the Euclidean distance to the
closest neighbor’s match and Max
is the maximum
disparity value. This new cost function forces pixels to
preserve coherent transformations with their neighbors.
D = SSD(r,s)/C(r, s) (2)
T (I
C(r, s)=1 d
(a) Src Image (b) Ref Image
(c) PM NNF (d) Ours NNF
(e) PM synthesized (f) Ours synthesized
(g) Details in (e) (h) Details in (f)
Figure 4: Matching results using original Patch Match
[BSFG09] (Left) and our implementation (right) for two iter-
ations using 7x7 patches. Images in the ’Art’ dataset courtesy
of [vis06]
Figures 4c and 4e show the influence of the coherence
problems described in Figure 3 in the matching results.
Figures 4d and 4f correspond to the results including
the improvements presented in this section. Figures 4c
and 4d show a color representation of the NNFs us-
ing HSV color space, magnitude of the transformation
vector is visualized in the saturation channel and the
angle in the hue channel. Areas represented with the
same color in the NNF color representation mean simi-
lar transformation. Objects in the same depth may have
similar transformation. Notice that the original Patch
Match [BSFG09] finds very different transformations
for neighbor pixels of the same objects and produces
artifacts in the synthesized image.
3.2 Warping Images and HDR Genera-
The warping images are generated as an average of the
patches that contribute to a certain pixel. Direct warp-
ing from the NNFs is possible, but it may generate vis-
ible artifacts as shown in Figure 5. This is due mainly
to incoherent matches between the I
and I
To solve this problems we use Bidirectional Similarity
Measure (BDSM) (Equation 5), proposed by Simakov
et al. [SCSI08] and used by Barnes et al. [BSFG09].
The BDSM is a similarity measure between pairs of im-
ages. It is defined for every patch Q I
and P I
and a number N of patches in each image respectively.
It consists of two terms: coherence ensures that the out-
put is geometrically coherent with the reference while
completeness ensures that the output image maximize
the amount of information from the source image:
z }| {
z }| {
This allows to improve both coherence and consistency
by using bidirectional NNFs (from I
to I
and back-
ward). It is more accurate to generate images using
three iterations in each direction than only six from I
to I
. Using BDSM also prevents from artifacts in the
occluded areas.
(a) Direct warping (b) Using BDSM
(c) Details in (c)
(d) Details in (d)
Figure 5: Images 5a and 5b are both synthesized from the
pair in Figure 4. Image 5a was directly warped using val-
ues only from the NNF of Figure 4c, which corresponds to
matching 4a to 4b. Image 5b was warped using the BDSM of
Equation 5 which implies both NNFs of Figures 4c and 4d.
Since the matching is totally independent for pairs of
images, it was implemented in parallel. Each image
matches all other views. This produces n-1 NNFs for
each view. The NNFs are in fact the two components of
the BDSM of equation 5. The new image is the result of
accumulating pixel colors of each overlapping neighbor
patch and averaging them.
The final HDR image is generated using a weighted av-
erage [MP95, DM97, MN99] as defined in Equation 6
and the weighting function of Equation 7 proposed by
Khan et al. [KAR06]:
E(i, j)=
(i, j))(
(i, j))
(i, j))
)=1 (2
where I
represents each image in the sequence, w cor-
responds to the weight, f is the CRF, Dt
is the exposure
time for the I
image of the sequence.
Five data-sets were selected in order to demonstrate the
robustness of our results. For the set ’Octo-cam’ all the
objectives capture the scene at the same time and syn-
chronized shutter speed. For the rest of data-sets the
scenes are static. This avoids the ghosting problem due
to dynamic objects in the scene. In all figures of this
paper we use the different LDR exposures for display
purposes only, the actual matching is done in radiance
space. The ’Octo-cam’ data-set (Figure 11) are eight
Figure 6: The Octo-cam multi-view Camera prototype.
RAW images with 10-bit of color depth per channel.
They were acquired simultaneously using the Octo-cam
10] with a resolution of 748x422 pixels. The
Octo-cam is a multi-view camera prototype composed
by eight objectives horizontally disposed (Figure 6).
All images are taken at the same shutter speed (40 ms)
but we use three pairs of neutral density filters that re-
duce the exposure dividing by 2, 4 and 8 respectively.
The exposure times for the input sequence are equiva-
lent to 5, 10, 20 and 40 ms respectively [BLV
12]. The
objectives are synchronized so all images corresponds
to the same time instant. There are no differences in the
scene because of movement of dynamic objects.
The sets ’Aloe’, ’Art’ and ’Dwarves’ are image sets ob-
tained from the Middlebury web site [vis06]. We se-
lected images that were acquired under fixed illumina-
tion conditions with shutter speed values of 125, 500
and 2000 ms for ’Aloe’and ’Art’ and values of 250,
1000 and 4000 ms for ’Dwarves’. They have a reso-
lution of 1390 x 1110 pixels and were taken from three
different views. Even if we have only 3 different expo-
sures we can use the seven available views by alternat-
ing the exposures like shown in the following images.
The last two data-sets were acquired from two of the
state of the art papers. Bätz et al. [BRG
14] shared
their image data set (IIS Jumble) at a resolution of
2560x1920 pixels. We selected five different views
from their images. They where acquired at shutter
speeds of 5, 30, 61, 122 and 280 ms respectively. Sel-
manovic et al. [SDBRC14] provided us 20 pairs of
HDR images both acquired from a scene and synthetic
examples. For 8-bit LDR data sets, the CRF is recov-
ered using a set of multiple exposure of a static scene.
All LDR images are also transformed to radiance space
for fair comparison with other algorithms.
4.1 Results and discussion
(a) Src Image (b) Ref Image
(c) PM NNF (d) Ours NNF
(e) PM synthesized (f) Ours synthesized
(g) Details in (e) (h) Details in (f)
Figure 7: Comparison between original Patch Match and
our implementation for two iterations using 7x7 patches. Im-
ages 7c and 7d show the improvement on the coherence of the
NNF using our method. Images cortesy of [SDBRC14]
Figure 7 shows a pair of linearized from HDR images
courtesy of Selmanovic et al. [SDBRC14] and the
comparison between the original PM from Barnes et
al. [BSFG09] and our method including the coherence
term and epipolar constrains. The images in Figures
7c and 7d represent the NNF. They are codified into an
image in HSV color space. Magnitude of the transfor-
mation vector is visualized in the saturation channel and
the angle in the hue channel. Notice that our result rep-
resent more homogeneous transformations, represented
in gray color. Images in Figure 7e and 7f are synthe-
sized result images for the Ref image obtained using
pixels only from the Src image. The results correspond
to the same number of iterations (2 in this case). Our
implementation converges faster producing accurate re-
sults in less iterations than the original method.
Figure 8 shows the NNFs and the images synthesized
for different iterations of both our method and the orig-
inal patch match. Our method converges faster and pro-
duce more coherent results than [BSFG09]. Matches
may not be accurate in terms of geometry in occluded
areas, which seems logical because there is not infor-
mation to compare with if certain area is occluded.
Even in such cases, the result is accurate in terms of
color. After several tests, only two iterations of our
method were enough to get good results while five it-
erations were recommended for previous approaches.
All the matching and synthesizing process is performed
in radiance space. They were converted to LDR us-
ing the corresponding exposure times and the CRF for
display purposes only. The use of an image synthe-
sis method like the BDSM instead of traditional stereo
matching allows us to synthesize values for occluded
areas too.
Figure 9 shows one example of the generated HDR cor-
responding to the lowest exposure LDR view in the IIS
Jumble data-set. It is the result of merging all syn-
thesized images obtained with the first view as refer-
ence. The darker image is also the one that contains
more noisy and under-exposed areas. HDR values were
recovered even for such areas and no visible artifacts
appears. On the contrary, the problem of recovering
HDR values for saturated areas in the reference image
remains unsolved. When the dynamic range differences
are extreme the algorithm does not provide accurate re-
sults. Future work must provide new techniques be-
cause the lack of information inside of saturated areas
does not allows patches to find good matches.
All HDR images in this section were linearized to 8-
bit LDRs, no gamma correction was performed. The
CRFs for the LDR images were calculated in a set of
aligned multi-exposed images using the software RAS-
CAL, provided by Mitsunaga and Nayar [MN99]. The
Figure 10 shows the result of our method for a whole
set of LDR multi-exposed images. All obtained images
are accurate in terms of contours, no visible artifacts
comparing to the LDR were obtained.
Figures 11 show the result of the proposed method in
a scene with important lighting variances. The pres-
ence of the light spot introduce extreme lighting differ-
ences between the different exposures. For bigger ex-
posures the light glows from the spot and saturate pix-
els not only inside the spot but also around it. There
is not information in saturated areas and the matching
algorithm does not find good correspondences. The dy-
namic range is then compromised in such areas and they
remain saturated.
Our method is not only accurate but faster than previ-
ous solutions. [SKY
12] mention that their method
takes less than 3 minutes for a sequence of 7 images
of 1350x900 pixels. The combination of a reduced
search space and the coherence term effectively implies
a reduction of the processing time. In a Intel Core i7-
2620M 2,70 GHz with 8 GB of memory, our method
takes less than 2 minutes (103 ± 10 seconds) for the
Aloe data set with a resolution of 1282x1110 pixels.
We would like do some quantitative comparison with
state of the art methods like [BRG
14] but no execu-
tion times were published.
(a) IIS Jumble data-set
(b) Lower exposure LDR (c) Tone-mapped HDR
(d) Details in (b) (e) Details in (c)
Figure 9: Details of the generated HDR image correspond-
ing to the darker exposure. Notice that under-exposed areas,
traditionally difficult to recover, are successfully generated
without visible noise or misaligned artifacts.
(a) Reference
(b) Source
(c) 1 iteration ours
(d) 1 iteration PM
(e) 2 iteration ours
(f) 2 iteration PM
(g) 10 iteration ours
(h) 10 iteration PM
Figure 8: Two images from the ’Dwarves’ set of LDR multi-view images form Middlebury [vis06]. Our method
with only two iterations achieve very accurate matches. Notice that the original patch match requires more itera-
tions to achieve good results in fine details of the image.
Figure 10: Up: ’Aloe’ set of LDR multi-view images from Middlebury web page [vis06]. Down: the resulting tone mapped
HDR taking each LDR as reference respectively. Notice the coherence between all generated images.
This paper presented a framework for auto-stereoscopic
3D HDR content creation that combines sets of mul-
tiscopic LDR images into HDR content using image
dense correspondences. Image dense correspondences
methods used for 2D domain, can not be used for 3D
HDR content creation without introducing visible arti-
facts. Our novel approach is extending the well known
Patch Match algorithm, introducing an improved ran-
dom search function that take advantage of the epipolar
geometry. Also a coherence term is used for improv-
ing the matching process. These modifications allow to
extend the original approach to work for HDR stereo
matching, while improving its computational perfor-
mances. We have presented a series of experimental
results showing the robustness of our approach, in the
matching process, when compared with the original ap-
proach and its qualitative results.
The authors would like to thank the reviewers and those
who collaborated with us for their suggestions. This
work was partially funded by the TIN2013-47137-C2-
2-P project from Ministerio de Economía y Competi-
tividad, Spain. Also by the Spanish Ministry of Sci-
ences through the Innovation Sub-Program Ramón y
Figure 11: Up: Set of LDR multi-view images acquired using the Octo-cam [PcPD
10]. Down: the resulting tone mapped
HDR taking each LDR as reference respectively. Despite the important exposure differences of the LDR sequence, coherent
HDR results are obtained. It is important to mention that highly saturated areas remain saturated in the resulting HDR.
Cajal RYC-2011-09372 and from the Catalan Govern-
ment 2014 SGR 1232. It was also supported by the
EU COST HDRi IC1005 and the SGR 2014 1232 from
Generalitat de Catalunya.
[AKCG14] Tara Akhavan, Christian Kapeller, Ji-Ho Cho,
and Margrit Gelautz. Stereo hdr disparity
map computation using structured light. In
HDRi2014 Second International Conference
and SME Workshop on HDR imaging, 2014.
[AYG13] Tara Akhavan, Hyunjin Yoo, and Margrit
Gelautz. A framework for hdr stereo matching
using multi-exposed images. In Proceedings of
HDRi2013 First International Conference and
SME Workshop on HDR imaging, Paper no. 8,
Oxford/Malden, 2013. The Eurographics Asso-
ciation and Blackwell Publishing Ltd.
[Ble] Patchmatch stereo - stereo matching with
slanted support windows.
12] Jennifer Bonnard, Celine Loscos, Gilles
Valette, Jean-Michel Nourrit, and Laurent
Lucas. High-dynamic range video acquisition
with a multiview camera. Optics, Photonics,
and Digital Technologies for Multimedia
Applications II, pages 84360A–84360A–11,
14] Michel Bätz, Thomas Richter, Jens-Uwe Gar-
bas, Anton Papst, Jürgen Seiler, and André
Kaup. High dynamic range video reconstruc-
tion from a stereo camera setup. Signal Process-
ing: Image Communication, 29(2):191 202,
2014. Special Issue on Advances in High Dy-
namic Range Video Research.
[BSFG09] Connelly Barnes, Eli Shechtman, Adam Finkel-
stein, and Dan B Goldman. Patchmatch: A ran-
domized correspondence algorithm for struc-
tural image editing. ACM Transactions on
Graphics (Proc. SIGGRAPH), 28(3), aug 2009.
[BVNL14] J. Bonnard, G. Valette, J.-M. Nourrit, and
C. Loscos. Analysis of the Consequences of
Data Quality and Calibration on 3D HDR Im-
age Generation. In European Signal Process-
ing Conference (EUSIPCO), Lisbonne, Portu-
gal, 2014.
[DM97] Paul Debevec and Jitendra Malik. Recover-
ing high dynamic range radiance maps from
photographs. In In proceedings of ACM
SIGGRAPH (Computer Graphics), volume 31,
pages 369–378, 1997.
[HTM13] Kanita Karaduzovic Hadziabdic, Jasminka Ha-
sic Telalovic, and Rafal Mantiuk. Comparison
of deghosting algorithms for multi-exposure
high dynamic range imaging. In Proceedings
of the 29th Spring Conference on Computer
Graphics, SCCG ’13, pages 021:21–021:28,
New York, NY, USA, 2013. ACM.
[KAR06] E.A. Khan, A.O. Akyiiz, and E. Reinhard.
Ghost removal in high dynamic range images.
In Image Processing, 2006 IEEE International
Conference on, pages 2005–2008, Oct 2006.
[KUWS03] Sing Bing Kang, Matthew Uyttendaele, Simon
Winder, and Richard Szeliski. High dynamic
range video. ACM Trans. Graph., 22(3):319–
325, 2003.
[LC09] Huei-Yung Lin and Wei-Zhe Chang. High
dynamic range imaging for stereoscopic scene
representation. In Image Processing (ICIP),
2009 16th IEEE International Conference on,
pages 4305–4308, 2009.
[LLR13] Laurent Lucas, Céline Loscos, and Yannick
Remion. 3D Video from Capture to Diffusion.
Wiley-ISTE, October 2013.
[MG10] Stephen Mangiat and Jerry Gibson. High dy-
namic range video with ghost removal. Proc.
SPIE, 7798:779812–779812–8, 2010.
[MN99] T. Mitsunaga and S.K. Nayar. Radiometric self
calibration. IEEE International Conf. Com-
puter Vision and Pattern Recognition, 1:374–
380, 1999.
[MP95] Steve Mann and R. W. Picard. On Being undig-
ital With Digital Cameras: Extending Dynamic
Range By Combining Differently Exposed Pic-
tures. Perceptual Computing Section, Media
Laboratory, Massachusetts Institute of Technol-
ogy, 1995.
[OMLA13] Raissel Ramirez Orozco, Ignacio Martin, Ce-
line Loscos, and Alessandro Artusi. Patch-
based registration for auto-stereoscopic hdr
content creation. In HDRi2013 - First Interna-
tional Conference and SME Workshop on HDR
imaging, Oporto Portugal, April 2013.
10] Jessica Prévoteau, Sylvia Chalenç con Piotin,
Didier Debons, Laurent Lucas, and Yannick
Remion. Multi-view shooting geometry for
multiscopic rendering with controlled distor-
tion. International Journal of Digital Multi-
media Broadcasting (IJDMB), special issue Ad-
vances in 3DTV: Theory and Practice, 2010:1–
11, March 2010.
[RBS99] Mark A. Robertson, Sean Borman, and
Robert L Stevenson. Dynamic range improve-
ment through multiple exposures. In In Proc. of
the Int. Conf. on Image Processing (ICIP 1999),
pages 159–163. IEEE, IEEE, 1999.
[RBS03] Mark A. Robertson, Sean Borman, and
Robert L. Stevenson. Estimation-theoretic ap-
proach to dynamic range enhancement using
multiple exposures. Journal of Electronic Imag-
ing, 12(2):219–228, 2003.
[Ruf11] Dominic Rufenacht. Stereoscopic High Dy-
namic Range Video. PhD thesis, Ecole
Polytechnique Fédérale de Lausanne (EPFL),
Switzerland, Agost 2011.
[SCSI08] D. Simakov, Y. Caspi, E. Shechtman, and
M. Irani. Summarizing visual data using
bidirectional similarity. IEEE Conference on
Computer Vision and Pattern Recognition 2008
(CVPR’08), 2008.
[SDBRC14] Elmedin Selmanovic, Kurt Debattista, Thomas
Bashford-Rogers, and Alan Chalmers. Enabling
stereoscopic high dynamic range video. Signal
Processing: Image Communication, 29(2):216
228, 2014. Special Issue on Advances in High
Dynamic Range Video Research.
12] Pradeep Sen, Nima Khademi Kalantari, Maziar
Yaesoubi, Soheil Darabi, Dan B. Goldman, and
Eli Shechtman. Robust patch-based HDR re-
construction of dynamic scenes. ACM Transac-
tions on Graphics (Proceedings of SIGGRAPH
Asia 2012), 31(6):203:1–203:11, November
[SMW10] N. Sun, H. Mansour, and R. Ward. Hdr image
construction from multi-exposed stereo ldr im-
ages. In Proceedings of the IEEE International
Conference on Image Processing (ICIP), Hong
Kong, 2010.
[SS12] Abhilash Srikantha and D. Sidibé, Désiré.
Ghost Detection and Removal for High Dy-
namic Range Images: Recent Advances. Sig-
nal Processing: Image Communication, page
10.1016/j.image.2012.02.001, February 2012.
23 pages.
[ST04] Peter Sand and Seth Teller. Video matching.
ACM Transactions on Graphics, 23(3):592–
599, August 2004.
[TKS06] A. Troccoli, Sing Bing Kang, and S. Seitz.
Multi-view multi-exposure stereo. In 3D Data
Processing, Visualization, and Transmission,
Third International Symposium on, pages 861–
868, June 2006.
[vis06] Middlebury stereo
datasets. http://vision.middlebury.
edu/stereo/data/, 2006.
[ZF03] Barbara Zitova and Jan Flusser. Image regis-
tration methods: a survey. Image and Vision
Computing, 21:977–1000, 2003.
... Several approaches have been presented for constructing an HDR image from two differently exposed LDR stereo images by calculating the depth information of the scene [12][13][14][15][16][17]. More recently, Batz et al. [18] and Orozco et al. [19] proposed interesting approaches for HDR video reconstruction using depth information. Orozco et al. introduced a patch match-based method to generate 3D HDR video sequences using available hardware. ...
Full-text available
Stereo matching under complex circumstances, such as low-textured areas and high dynamic range (HDR) scenes, is an ill-posed problem. In this paper, we introduce a stereo matching approach for real-world HDR scenes which is backward compatible to conventional stereo matchers. For this purpose, (1) we compare and evaluate the tone-mapped disparity maps to find the most suitable tone-mapping approach for the stereo matching purpose. Thereof, (2) we introduce a combining graph-cut based framework for effectively fusing the tone-mapped disparity maps obtained from different tone-mapped input image pairs. And finally, (3) we generate reference ground truth disparity maps for our evaluation using the original HDR images and a customized stereo matching method for HDR inputs. Our experiments show that, combining the most effective features of tone-mapped disparity maps, an improved version of the disparity is achieved. Not only our results reduce the low dynamic range (LDR), conventional disparity errors by the factor of 3, but also outperform the other well-known tone-mapped disparities by providing the closest results to the original HDR disparity maps.
... Solutions were proposed for 3D HDR images with stereo cameras [22][10] or multi-stereo cameras [3] following stereovision-based procedures, with remaining inaccuracies in under-or over-exposed areas. This is improved using patch-map along the epipolar line [20] but spatial coherence is lost. ...
High-dynamic range imaging permits to extend the dynamic range of intensity values to get close to what the human eye is able to perceive. Although there has been a huge progress in the digital camera sensor range capacity, the need of capturing several exposures in order to reconstruct high-dynamic range values persist. In this paper, we present a study on how to acquire high-dynamic range values for multi-stereo images. In many papers, disparity has been used to register pixels of different images and guide the reconstruction. In this paper, we show the limitations of such approaches and propose heuristics as solutions to identified problematic cases.
It is possible to generate stereo high dynamic range (HDR) images/videos by using a pair of cameras with different exposure parameters. In this paper, a learning-based stereo HDR imaging (SHDRI) method with three modules is proposed. In the proposed method, we construct three convolutional neural network (CNN) modules that perform specific tasks, including exposure calibration CNN (EC-CNN) module, hole-filling CNN (HF-CNN) module and HDR fusion CNN (HDRF-CNN) module, to combine with traditional image processing methods to model SHDRI pipeline. To avoid ambiguity, we assume that the left-view image is under-exposed and the right-view image is over-exposed. Specifically, the EC-CNN module is first constructed to convert stereo multi-exposure images into the same exposure to facilitate subsequent stereo matching. Then, based on the estimated disparity map, the right-view image is forward-warped to generate the initial left-view over-exposure image. After that, extra exposure information is utilized to guide hole-filling. Finally, the HDRF-CNN module is constructed and employed to extract fusion features to fuse the hole-filled left-view over-exposure image with the original left-view under-exposure image into the left-view HDR image. Right-view HDR images can be generated in the same way. In addition, we propose an effective two-phase training strategy to overcome the lack of a sufficient large stereo multi-exposure dataset. The experimental results demonstrate that the proposed method can generate stereo HDR images with high visual quality. Furthermore, the proposed method achieves better performance in comparison with the latest SHDRI method.
Full-text available
Stereoscopic video content is usually being created by using two or more cameras which are recording the same scene. Traditionally, those cameras have the exact same intrinsic camera parameters. In this project, the exposure times of the cameras differ, allowing to record different parts of the dynamic range of the scene. Image processing techniques are then used to enhance the dynamic range of the captured data. A pipeline for the recording, processing, and displaying of high dynamic range (HDR) stereoscopic content, acquired using inexpensive low dynamic range (LDR) cameras, is proposed. Two different approaches to obtain stereoscopic HDR content are presented and compared. In the temporal approach, different parts of the luminance range of the scene are recorded by temporally changing the exposure time of both cameras. Information from adjacent frames captured by the same camera is then used in order to increase the dynamic range. In the spatial approach, both cameras are assigned a distinct, fixed exposure time. Here, the dynamic range is increased by combining data from the cameras. It is found that the intrinsic problems of the spatial approach are much more difficult to deal with than the ones of the temporal approach. In particular stereo matching, the critical component to combine data in the spatial approach, is more difficult than traditionally because the two cameras have different exposure times. The results are evaluated for both static scenes and scenes with object movement using an objective quality metric of the visible differences of the stereoscopic pair independently, and visual evaluation on a stereoscopic display to evaluate the stereoscopic quality.
Conference Paper
Full-text available
We propose a new methodology to acquire HDR video content for autostereoscopic displays by adapting and augmenting an eight view video camera with standard sensors. To augment the intensity capacity of the sensors, we combine images taken at dierent exposures. Since the exposure has to be the same for all objectives of our camera, we x the exposure variation by applying neutral density lters on each objective. Such an approach has two advantages: several exposures are known for each video frame and we do not need to worry about synchronization. For each pixel of each view, an HDR value is computed by a weighted average function applied to the values of matching pixels from all views. The building of the pixel match list is simplied by the property of our camera which has eight aligned, equally distributed objectives. At each frame, this results in an individual HDR image for each view while only one exposition per view was taken. The nal eight HDR images are tone-mapped and interleaved for autostereoscopic display.
Conference Paper
Full-text available
We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good ldquovisual summaryrdquo should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa. The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.
We present a new approach for improving the effective dynamic range of cameras by using multiple photographs of the same scene taken with different exposure times. Using this method enables the photographer to accurately capture scenes that contain high dynamic range by using a device with low dynamic range, which allows the capture of scenes that have both very bright and very dark regions. We approach the problem from a probabilistic standpoint, distinguishing it from the other methods reported in the literature on photographic dynamic range improvement. A new method is proposed for determining the camera's response function, which is an iterative procedure that need be done only once for a particular camera. With the response function known, high dynamic range images can be easily constructed by a weighted average of the input images. The particular form of weighting is controlled by the probabilistic formulation of the problem, and results in higher weight being assigned to pixels taken at longer exposure times. The advantages of this new weighting scheme are explained by com- parison with other methods in the literature. Experimental results are presented to demonstrate the utility of the algorithm. © 2003 SPIE
Conference Paper
We propose to analyze consequences of input data quality on 3D HDR image generation. Input data are images from different viewpoints and different exposures. The ease and precision of 3D HDR images merging depends on how input data are created or acquired. We study the benefits and drawbacks of using an inbuilt multiview camera against a single camera with a simulation on computer generated images. This work builds on a previously published 3D HDR method based on disparity to guide HDR matching. In this paper, we outline the errors that occur when too little precaution is taken, coming on the one hand from poor pixel quality and on the other hand from poor geometrical setup.
Conference Paper
The real world encompasses a high range of luminances. In order to capture and represent this range correctly, High Dynamic Range (HDR) imaging techniques are introduced. Some of these techniques are based on constructing an HDR image from several Low Dynamic Range (LDR) images with different exposures. In the capture and reconstruction phases, the HDR reproduction techniques must resolve the differences between the input LDR images due to camera and object movement. In this study, two recent approaches addressing this issue are compared using a novel dataset comprised of image sequences with varying complexity. The results are evaluated by using both objective and subjective measures.
Stereoscopic and high dynamic range (HDR) imaging are two methods that enhance video content by respectively improving depth perception and light representation. A large body of research has looked into each of these technologies independently, but very little work has attempted to combine them due to limitations in capture and display; HDR video capture (for a wide range of exposure values over 20 f-stops) is not yet commercially available and few prototype HDR video cameras exist. In this work we propose techniques which facilitate stereoscopic high dynamic range (SHDR) video capture by using an HDR and LDR camera pair. Three methods are proposed: one based on generating the missing HDR frame by warping the existing one using a disparity map; increasing the range of LDR video using a novel expansion operator; and a hybrid of the two where expansion is used for pixels within the LDR range and warping for the rest. Generated videos were compared to the ground truth SHDR video captured using two HDR video cameras. Results show little overall error and demonstrate that the hybrid method produces the least error of the presented methods.
To overcome the dynamic range limitations in images taken with regular consumer cameras, several methods exist for creating high dynamic range (HDR) content. Current low-budget solutions apply a temporal exposure bracketing which is not applicable for dynamic scenes or HDR video. In this article, a framework is presented that utilizes two cameras to realize a spatial exposure bracketing, for which the different exposures are distributed among the cameras. Such a setup allows for HDR images of dynamic scenes and HDR video due to its frame by frame operating principle, but faces challenges in the stereo matching and HDR generation steps. Therefore, the modules in this framework are selected to alleviate these challenges and to properly handle under- and oversaturated regions. In comparison to existing work, the camera response calculation is shifted to an offline process and a masking with a saturation map before the actual HDR generation is proposed. The first aspect enables the use of more complex camera setups with different sensors and provides robust camera responses. The second one makes sure that only necessary pixel values are used from the additional camera view, and thus, reduces errors in the final HDR image. The resulting HDR images are compared with the quality metric HDR-VDP-2 and numerical results are given for the first time. For the Middlebury test images, an average gain of 52 points on a 0-100 mean opinion score is achieved in comparison to temporal exposure bracketing with camera motion. Finally, HDR video results are provided.
We propose a new method for ghost-free high dynamic range (HDR) video taken with a camera that captures alternating short and long exposures. These exposures may be combined using traditional HDR techniques, however motion in a dynamic scene will lead to ghosting artifacts. Due to occlusions and fast moving objects, a gradient-based optical flow motion compensation method will fail to eliminate all ghosting. As such, we perform simpler block-based motion estimation and refine the motion vectors in saturated regions using color similarity in the adjacent frames. The block-based search allows motion to be calculated directly between adjacent frames over a larger search range, yet at the cost of decreased motion fidelity. To address this, we investigate a new method to fix registration errors and block artifacts using a cross-bilateral filter to preserve the edges and structure of the original frame while retaining the HDR color information. Results show promising dynamic range expansion for videos with fast local motion.
High dynamic range (HDR) image generation and display technologies are becoming increasingly popular in various applications. A standard and commonly used approach to obtain an HDR image is the multiple exposures' fusion technique which consists of combining multiple images of the same scene with varying exposure times. However, if the scene is not static during the sequence acquisition, moving objects manifest themselves as ghosting artefacts in the final HDR image. Detecting and removing ghosting artefacts is an important issue for automatically generating HDR images of dynamic scenes. The aim of this paper is to provide an up-to-date review of the recently proposed methods for ghost-free HDR image generation. Moreover, a classification and comparison of the reviewed methods is reported to serve as a useful guide for future research on this topic.