Content uploaded by Peter Eisert

Author content

All content in this area was uploaded by Peter Eisert on Nov 27, 2012

Content may be subject to copyright.

Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.

c

2002 IEEE

GEOMETRY REFINEMENT FOR LIGHT FIELD COMPRESSION

Prashant Ramanathan, Eckehard Steinbach, Peter Eisert and Bernd Girod

Information Systems Laboratory

Stanford University

pramanat,steinb,eisert,bgirod @Stanford.EDU

ABSTRACT

In geometry-aided light ﬁeld compression, a geometry model is

used for disparity-compensated prediction of light ﬁeld images

from already encoded light ﬁeld images. This geometry model,

however, may have limited accuracy. We present an algorithm that

reﬁnes a geometry model to improve the over-all light ﬁeld com-

pression efﬁciency. This algorithm uses an optical-ﬂow technique

to explicitly minimize the disparity-compensated prediction error.

Results from experiments performed on both real and synthetic

data sets show bit-rate reductions of approximately

% using the

improved geometry model over a silhouette-reconstructed geome-

try model.

1. INTRODUCTION

Image-based rendering has emerged as an important new alterna-

tive to traditional image synthesis techniques in computer graph-

ics. With image-based rendering, scenes can be rendered by sam-

pling previously acquired image data, instead of synthesizing them

from light and surface shading models and scene geometry. Light

ﬁeld rendering [1, 2] is one such image-based technique that is

particularly useful for interactive applications.

A light ﬁeld is a 4-D data set which can be parameterized as

a 2-D array of 2-D light ﬁeld images. For photo-realistic quality,

a large number of high-resolution light ﬁeld images is required,

resulting in extremely large data sets. For example, the light ﬁeld

of Michelangelo’s statue of Night contains tens of thousands of

images and requires over 90 Gigabytes of storage for raw data [3].

Compression is therefore essential for light ﬁelds.

Currently, the most efﬁcient techniques for light ﬁeld com-

pression use disparity compensation, analogous to motion com-

pensation in video compression. In disparity compensation, im-

ages are predicted frompreviously encoded reference images. Dis-

parity or depth values are either speciﬁed for a block of pixels, or

inferred from a geometry model [4, 5, 6].

In this paper, we consider disparity-compensated light ﬁeld

compression using an explicit geometry model. A geometry model

can be an efﬁcient method of specifying the depth values required

for disparity-compensated prediction. The geometry models used

may be of limited accuracy for several reasons. The model may be

generated from image data using error-pronecomputer vision tech-

niques. Even an accurate model must be represented digitally in a

ﬁnite number of bits, and therefore, some degree of approximation

is necessary. The result of this geometry inaccuracy is reduced

compression efﬁciency for the compression algorithm [7].

In this paper, we describe a method of reﬁning the geome-

try model to reduce the disparity-compensated prediction error,

and improve compression efﬁciency. Our algorithm is similar to

the Sliding Textures approach [8, 9], differing in only a few de-

tails. One of the main contributions of this paper is to apply these

ideas to the problem of light ﬁeld compression. In Section 2, we

review the basics of geometry-based disparity-compensated light

ﬁeld compression. In Section 3, we present our method for geom-

etry reﬁnement. We present our results in Section 4.

2. GEOMETRY-BASED DISPARITY-COMPENSATED

LIGHT FIELD COMPRESSION

Disparity compensation is used in most current light ﬁeld com-

pression algorithms [4, 5, 6]. The underlying idea of disparity-

compensated prediction is that a pixel in a light ﬁeld image can

be predicted from corresponding pixels in one or more other light

ﬁeld images. This prediction requires a depth value for a given

pixel. In a light ﬁeld, the recording geometry is known, which

means that by specifying the depth, it is possible to establish cor-

respondence between pixels in two different views. This pixel cor-

respondence allows for the prediction of pixels of one view from

another. We assume that the imaged surface in both views look

similar, which is true for Lambertian, unoccluded surfaces.

In a geometry-based prediction scheme, depth values are in-

ferred from an explicit geometry model by rendering the model.

The reference images that are used to predict a particular image

must be deﬁned. We follow the hierarchical coding structure de-

scribed by Magnor and Girod [4]. Here, each image is predicted

from two to four reference images. The order in which images are

encoded is also deﬁned, so that images are predicted from images

that are already encoded.

3. PROPOSED ALGORITHM FOR GEOMETRY

REFINEMENT

In this section we present an algorithm that modiﬁes the geom-

etry model for better compression performance. We explain the

optical-ﬂow-based shape reﬁnement method, and the iterative reg-

ularized least-squares method used to ﬁnd the solution.

3.1. Optical-ﬂow-based equation

A light ﬁeld image may be predicted from one or more reference

light ﬁeld images that have been previously encoded, according to

the hierarchical structure discussed intheprevious section. We call

the image and pixel to be predicted the target image and pixel, and

the images and pixels from where they are predicted the reference

images and pixels. The intensity of a given target pixel is predicted

Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.

c

2002 IEEE

from the corresponding pixel in a reference image. Any given tar-

get pixel corresponds to a speciﬁc -D point along the viewing ray

in -D space. If we allow this line to be parameterized by the

depth , we obtain the line equation, in world coordinates,

(1)

Note that this line is a function of the intrinsic and extrinsic

camera parameters of the target view, as well as the pixel position

in the image. By projecting this line into the reference view, we

obtain a -D line . This so-called epipolar line

(2)

also parameterized by the original depth parameter , is a function

of the pixel position in the target image, and the camera parameters

of the target and reference images.

Specifying a depth value ﬁxes the point in -D space as well

as in the reference view on the epipolar line. Thus, we obtain a

corresponding reference pixel for the target pixel. We assume that

the true value of will result in the same intensity for the corre-

sponding target and reference pixels. An error in the depth will

result in a prediction error, denoted by the difference in intensity

. This is a function of the intensity value at the target pixel

and at the reference pixel given by

(3)

where is the current (inaccurate) depth value, and is the correct

depth value that results in no prediction error.

If we assume the intensity gradient

(4)

to be locally constant over this region in the reference image, we

obtain the familiar optical ﬂow equation

(5)

However, this equation only relates prediction error to the

depth parameter for a given pixel. We need to further relate this

to the parameters of the geometry model using the relation

(6)

where is the vector of geometry parameters and is a nonlin-

ear multivariate function that maps the geometry parameters to the

depth for a given pixel.

We now describe the mapping function for our problem. For

the triangle mesh geometry model that we use, the geometry pa-

rameters are the positions of each of the vertices in the model. In

addition, we restrict the movement of these vertices to one degree

of freedom, radially from the center of the model.

For a particular target pixel, the corresponding -D line in-

tersects the geometry at exactly one triangle face. Therefore, the

depth parameter is determined by the three vertices that deﬁne

this triangle face. By characterizing this triangle as an inﬁnite

plane deﬁned by its three vertices, we obtain a differentiable func-

tion that describes the depth parameter in terms of the geom-

etry parameters . Note that we assume that other vertices will

not affect this pixel, through occlusion for example, and that the

pixel will not move off this triangle. Both of these assumptions are

supported by the restriction that changes in the geometry parame-

ters will be small, enforced by regularization in the least-squares

solution.

Combining (5) and (6), we obtain

(7)

where denotes the current geometry conﬁguration. Both and

are non-linear functions of the parameter vector . We can lin-

earize this equation, and theresulting equation will be valid locally

around . If

(8)

then

(9)

where and is a matrix of size , with as

the number of geometry parameters.

Substituting (9) into (7), we obtain the following equation

(10)

for each pixel and a corresponding reference view.

3.2. Least-Squares Solution

This equation may be derived for all the pixels in the light ﬁeld

that are to be predicted, and can be combined to form the matrix

equation

(11)

where

(12)

and

(13)

Because our linearized problem and our mapping function

are only valid for small , we must include regularization into

the solution of our problem. This gives us the equation

(14)

where is the regularization constant. A larger value for means

that the solution will be smaller, and the problem will be more

numerically stable. When is too large, however, it takes many

iterations to converge to the solution. In our experiments, the value

for is selected empirically.

This linearized problem can be solved using the least-squares

approach. Since an equation is formed for each pixel predicted

from a reference view, the number of rows of can be large. In

any particular equation, only three parameters are speciﬁed, there-

fore is also sparse. We use the LSQR method [10], which is

well-suited to large, sparse problems, in our implementation.

Once we obtain a new geometry model from the solution, we

can again linearize the equations about the new operating point,

and solve for the new change in geometry parameters. We can

iteratively perform these two steps until we converge to the best

geometry model.

Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.

c

2002 IEEE

4. RESULTS

Our experiments use both real and synthetic light ﬁeld data sets.

An initial approximate geometry model is created using the silhou-

ette information from the light ﬁeld image data. This silhouette-

reconstructed geometry model is reﬁned using the technique de-

scribed in this paper to obtain the improved geometry model. For

the synthetic light ﬁelds, we also have the true geometry models,

which can serve as a useful reference point. We encode the light

ﬁelds using each of these geometry models, and compare their rel-

ative rate-PSNR performance. The light ﬁeld coder is described

next.

4.1. Light Field Coder

The light ﬁeld coder in our work uses block-based disparity-

compensation both without and with an explicit geometry model

[4, 5]. All images are divided up into

blocks. Each block is

encoded in one of several modes: the INTRA mode, where DCT-

based image compression is used for the block; the GEO mode,

where an explicit geometry model is used to predict the block from

reference images; the STD (standard) mode where a depth value

is speciﬁed to predict the block from reference images; and the

COPY mode, where a block from the same image location is sim-

ply copied from the reference image. For the STD mode, the depth

values are quantized such that they correspond to approximately

integer-pixel accuracy in the image plane. In the GEO and STD

modes, a DCT-based residual encoder is used on the prediction er-

ror. Mode selection is based on a rate-distortion Lagrangian cost

function

(15)

where is the sum-squared-error distortion of the block image,

and is the rate in bits for the block. The mode with the small-

est Lagrangian cost is chosen. A rate-PSNR curve is obtained by

varying the image quality, using the quantization parameter in

the DCT intra and residual coders. The Lagrangian multiplier

that is used to trade off rate versus distortion is adjusted according

to the quantization parameter using the following equation com-

monly used in video compression, [11]

(16)

4.2. Experiments

Four data sets were used in our experiments. The ﬁrst two, Star

and Cube, are synthetic light ﬁelds, each with images of reso-

lution . The last two, Garﬁeld29 and Garﬁeld288, are

light ﬁelds recorded from a real-world object, the same plush toy.

Garﬁeld29 has images, each of resolution , covering

the frontal region of the object, while Garﬁeld288 has images,

each of resolution covering the entire hemi-sphere of

views.

For each of the data sets, we derive a geometry model that is

consistent with the silhouette from each view. We begin with a

-vertex subdivided icosahedron model that is larger than the

object. In each view, for each vertex, if a vertex lies outside of the

silhouette, we move it radially towards a center point so that it lies

on the silhouette border. We thereby obtain a -vertex object

that matches the silhouette in all views.

For each of the four light ﬁelds, we reﬁne this silhouette-

reconstructed geometry object to obtain an improved geometry ob-

ject. Typically, we use anywhere from to iterations and

a regularization constant . Both of these quantities are

determined empirically based on the results.

Figure 1 shows the results of our algorithm for the Star light

ﬁeld. Figure 1(a) shows the true geometry, Figure 1(b) shows the

silhouette-reconstructed geometry, and Figure 1(c) shows the re-

ﬁned geometry. Figure 2 illustrates the geometry results for the

real-world Garﬁeld29 light ﬁeld. Figure 2(a) shows only face of

the object from one image of the light ﬁeld. Figure 2(b) shows

the silhouette-reconstructed geometry, and Figure 2(c) shows the

reﬁned geometry.

(a) True Geo. (b) Silhouette Geo.

(c) Improved Geo.

Fig. 1. Geometry models for the Star light ﬁeld. The near-exact

constrained geometry (not pictured) is visually identical to the true

geometry.

(a) Light ﬁeld image

(b) Silhouette Geo.

(c) Improved Geo.

Fig.2. Magniﬁed portion of light ﬁeld image and geometry models

for the Garﬁeld29 light ﬁeld.

For all of the light ﬁeld data sets, we compare the efﬁciency of

our light ﬁeld coder using the silhouette-reconstructed geometry

versus the improved geometry. For the two synthetic light ﬁelds,

we can compare with the results of the true geometry model as

well. Our algorithm constrains the set of possible improved ge-

ometry outcomes, since it uses only vertices and constrains

the positions of these vertices to be in the same directions as the

original subdivided icosahedron vertices. To understand the possi-

ble effect of these constraints, we create another geometry model

that is subject to these constraints, but ﬁt as close as possible to

the exact geometry model. We call this our near-exact constrained

geometry. This geometry model represents the best possible ge-

ometry result under the constraints that we have placed on the al-

gorithm.

Figures 3 and 4 show the Rate-PSNR curves using the various

geometry models for the Star light ﬁeld and the Garﬁeld29 light

ﬁeld, respectively. The bit-rate for the geometry models is not

included. Since we have a regular icosahedron arrangement of

vertices, where only the vertex radii must be speciﬁed, this

bit-rate will be negligible compared to the overall bit-rate for the

light ﬁeld. The PSNR is measured over the entire image.

Due to spaceconsiderations, we do not show the curves for the

other data sets. In all cases, we see a bit-rate reduction of approxi-

mately % usingtheimproved geometry instead of the silhouette-

Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.

c

2002 IEEE

reconstructed geometry. This corresponds to an increase of ap-

proximately dB in PSNR. The results for the synthetic data sets

indicate that there still exists a large performance gap between

the improved geometry and the exact geometry. The results for

the near-exact constrained geometry show, however, that only an-

other % is possible using our constrained arrangement. In other

words, our improved geometry realizes % of the gain possible

under our constrained arrangement.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

30

32

34

36

38

40

42

44

46

48

Bit Rate (bits/pixel)

PSNR (dB)

Star Light Field

Silhouette Geometry

Improved Geometry

Near−exact Constrained Geometry

Exact Geometry

Fig. 3. Rate-PSNR for Star Light Field. We see a % bit-rate re-

duction using the improved geometry over the original silhouette-

reconstructed geometry. The near-exact constrained geometry

shows us the best possible result for our constrained arrangement.

There is still a large performance gap from the exact geometry re-

sults.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

34

36

38

40

42

44

46

48

50

Bit Rate (bits/pixel)

PSNR (dB)

Garfield29 Light Field

Silhouette Geometry

Improved Geometry

Fig. 4. Rate-PSNR for Garﬁeld29 Light Field. We see a %

bit-rate reduction using the improved geometry over the original

silhouette-reconstructed geometry for this real-world light ﬁeld.

5. CONCLUSIONS

We have presented an algorithm to automatically reﬁne the ge-

ometry model used for disparity-compensated light ﬁeld com-

pression. This improved geometry model reduces the disparity-

compensation prediction error and improves the compression ef-

ﬁciency. Our experiments show bit-rate savings of approxi-

mately % using the reﬁned geometry model over the silhouette-

reconstructed geometry model. These experiments were per-

formed on both real and synthetic light ﬁeld data sets.

Results from the synthetic data sets indicate that the algorithm

may be improved signiﬁcantly by relaxing some of the geometric

constraints in the algorithm.

6. REFERENCES

[1] Marc Levoy and Pat Hanrahan, “Light ﬁeld rendering,” in

Computer Graphics (Proceedings SIGGRAPH96), August

1996, pp. 31–42.

[2] Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and

Michael F. Cohen, “The lumigraph,” in Computer Graphics

(Proceedings SIGGRAPH96), August 1996, pp. 43–54.

[3] Marc Levoy, Kari Pulli, et al., “The Digital Michelangelo

project: 3D scanning of large statues,” in Computer Graphics

(Proceedings SIGGRAPH00), August 2000, pp. 131–144.

[4] Marcus Magnor and Bernd Girod, “Data compression for

light ﬁeld rendering,” IEEE Transactions on Circuits and

Systems for Video Technology, vol. 10, no. 3, pp. 338–343,

April 2000.

[5] Marcus Magnor, Peter Eisert, and Bernd Girod, “Model-

aided coding of multi-viewpoint image data,” in Proceedings

of the IEEE International Conference on Image Processing

ICIP-2000, Vancouver, Canada, September 2000, vol. 2, pp.

919–922.

[6] Xin Tong and Robert M. Gray, “Coding of multi-view im-

ages for immersive viewing,” in Proceedings of the Interna-

tional Conference on Acoustics, Speech, and Signal Process-

ing ICASSP 2000, Istanbul, Turkey, June 2000, vol. 4, pp.

1879–1882.

[7] Marcus Magnor, Geometry-Adaptive Multi-View Coding

Techniques for Image-based Rendering, Ph.D. thesis, Uni-

versity Erlangen-Nuremberg, Germany, 2001.

[8] Peter Eisert, Eckehard Steinbach, and Bernd Girod, “Auto-

matic reconstruction of stationary 3-D objects from multiple

uncalibrated camera views,” IEEE Transactions on Circuits

and Systems for Video Technology, vol. 10, no. 2, pp. 261–

277, March 2000.

[9] Eckehard Steinbach, Peter Eisert, and Bernd Girod, “Model-

based 3-D shape and motion estimation using sliding tex-

tures,” in Proceedings Vision, Modelling and Visualization

2001, Stuttgart, Germany, November 2001.

[10] Christopher C. Paige and Michael A. Saunders, “LSQR:

An algorithm for sparse linear equations and sparse least

squares,” ACM Transactions on Mathematical Software, vol.

8, no. 1, pp. 43–71, March 1982.

[11] Gary J. Sullivan and Thomas Wiegand, “Rate-distortion op-

timization for video compression,” IEEE Signal Processing

Magazine, vol. 15, pp. 74–90, November 1998.