ArticlePDF Available

Abstract and Figures

In geometry-aided light field compression, a geometry model is used for disparity-compensated prediction of light field images from already encoded light field images. This geometry model, however, may have limited accuracy. We present an algorithm that refines a geometry model to improve the over-all light field compression efficiency. This algorithm uses an optical-flow technique to explicitly minimize the disparity-compensated prediction error. Results from experiments performed on both real and synthetic data sets show bit-rate reductions of approximately % using the improved geometry model over a silhouette-reconstructed geometry model.
Content may be subject to copyright.
Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.
c
2002 IEEE
GEOMETRY REFINEMENT FOR LIGHT FIELD COMPRESSION
Prashant Ramanathan, Eckehard Steinbach, Peter Eisert and Bernd Girod
Information Systems Laboratory
Stanford University
pramanat,steinb,eisert,bgirod @Stanford.EDU
ABSTRACT
In geometry-aided light field compression, a geometry model is
used for disparity-compensated prediction of light field images
from already encoded light field images. This geometry model,
however, may have limited accuracy. We present an algorithm that
refines a geometry model to improve the over-all light field com-
pression efficiency. This algorithm uses an optical-flow technique
to explicitly minimize the disparity-compensated prediction error.
Results from experiments performed on both real and synthetic
data sets show bit-rate reductions of approximately
% using the
improved geometry model over a silhouette-reconstructed geome-
try model.
1. INTRODUCTION
Image-based rendering has emerged as an important new alterna-
tive to traditional image synthesis techniques in computer graph-
ics. With image-based rendering, scenes can be rendered by sam-
pling previously acquired image data, instead of synthesizing them
from light and surface shading models and scene geometry. Light
field rendering [1, 2] is one such image-based technique that is
particularly useful for interactive applications.
A light field is a 4-D data set which can be parameterized as
a 2-D array of 2-D light field images. For photo-realistic quality,
a large number of high-resolution light field images is required,
resulting in extremely large data sets. For example, the light field
of Michelangelo’s statue of Night contains tens of thousands of
images and requires over 90 Gigabytes of storage for raw data [3].
Compression is therefore essential for light fields.
Currently, the most efficient techniques for light field com-
pression use disparity compensation, analogous to motion com-
pensation in video compression. In disparity compensation, im-
ages are predicted frompreviously encoded reference images. Dis-
parity or depth values are either specified for a block of pixels, or
inferred from a geometry model [4, 5, 6].
In this paper, we consider disparity-compensated light field
compression using an explicit geometry model. A geometry model
can be an efficient method of specifying the depth values required
for disparity-compensated prediction. The geometry models used
may be of limited accuracy for several reasons. The model may be
generated from image data using error-pronecomputer vision tech-
niques. Even an accurate model must be represented digitally in a
finite number of bits, and therefore, some degree of approximation
is necessary. The result of this geometry inaccuracy is reduced
compression efficiency for the compression algorithm [7].
In this paper, we describe a method of refining the geome-
try model to reduce the disparity-compensated prediction error,
and improve compression efficiency. Our algorithm is similar to
the Sliding Textures approach [8, 9], differing in only a few de-
tails. One of the main contributions of this paper is to apply these
ideas to the problem of light field compression. In Section 2, we
review the basics of geometry-based disparity-compensated light
field compression. In Section 3, we present our method for geom-
etry refinement. We present our results in Section 4.
2. GEOMETRY-BASED DISPARITY-COMPENSATED
LIGHT FIELD COMPRESSION
Disparity compensation is used in most current light field com-
pression algorithms [4, 5, 6]. The underlying idea of disparity-
compensated prediction is that a pixel in a light field image can
be predicted from corresponding pixels in one or more other light
field images. This prediction requires a depth value for a given
pixel. In a light field, the recording geometry is known, which
means that by specifying the depth, it is possible to establish cor-
respondence between pixels in two different views. This pixel cor-
respondence allows for the prediction of pixels of one view from
another. We assume that the imaged surface in both views look
similar, which is true for Lambertian, unoccluded surfaces.
In a geometry-based prediction scheme, depth values are in-
ferred from an explicit geometry model by rendering the model.
The reference images that are used to predict a particular image
must be defined. We follow the hierarchical coding structure de-
scribed by Magnor and Girod [4]. Here, each image is predicted
from two to four reference images. The order in which images are
encoded is also defined, so that images are predicted from images
that are already encoded.
3. PROPOSED ALGORITHM FOR GEOMETRY
REFINEMENT
In this section we present an algorithm that modifies the geom-
etry model for better compression performance. We explain the
optical-flow-based shape refinement method, and the iterative reg-
ularized least-squares method used to find the solution.
3.1. Optical-flow-based equation
A light field image may be predicted from one or more reference
light field images that have been previously encoded, according to
the hierarchical structure discussed intheprevious section. We call
the image and pixel to be predicted the target image and pixel, and
the images and pixels from where they are predicted the reference
images and pixels. The intensity of a given target pixel is predicted
Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.
c
2002 IEEE
from the corresponding pixel in a reference image. Any given tar-
get pixel corresponds to a specific -D point along the viewing ray
in -D space. If we allow this line to be parameterized by the
depth , we obtain the line equation, in world coordinates,
(1)
Note that this line is a function of the intrinsic and extrinsic
camera parameters of the target view, as well as the pixel position
in the image. By projecting this line into the reference view, we
obtain a -D line . This so-called epipolar line
(2)
also parameterized by the original depth parameter , is a function
of the pixel position in the target image, and the camera parameters
of the target and reference images.
Specifying a depth value fixes the point in -D space as well
as in the reference view on the epipolar line. Thus, we obtain a
corresponding reference pixel for the target pixel. We assume that
the true value of will result in the same intensity for the corre-
sponding target and reference pixels. An error in the depth will
result in a prediction error, denoted by the difference in intensity
. This is a function of the intensity value at the target pixel
and at the reference pixel given by
(3)
where is the current (inaccurate) depth value, and is the correct
depth value that results in no prediction error.
If we assume the intensity gradient
(4)
to be locally constant over this region in the reference image, we
obtain the familiar optical flow equation
(5)
However, this equation only relates prediction error to the
depth parameter for a given pixel. We need to further relate this
to the parameters of the geometry model using the relation
(6)
where is the vector of geometry parameters and is a nonlin-
ear multivariate function that maps the geometry parameters to the
depth for a given pixel.
We now describe the mapping function for our problem. For
the triangle mesh geometry model that we use, the geometry pa-
rameters are the positions of each of the vertices in the model. In
addition, we restrict the movement of these vertices to one degree
of freedom, radially from the center of the model.
For a particular target pixel, the corresponding -D line in-
tersects the geometry at exactly one triangle face. Therefore, the
depth parameter is determined by the three vertices that define
this triangle face. By characterizing this triangle as an infinite
plane defined by its three vertices, we obtain a differentiable func-
tion that describes the depth parameter in terms of the geom-
etry parameters . Note that we assume that other vertices will
not affect this pixel, through occlusion for example, and that the
pixel will not move off this triangle. Both of these assumptions are
supported by the restriction that changes in the geometry parame-
ters will be small, enforced by regularization in the least-squares
solution.
Combining (5) and (6), we obtain
(7)
where denotes the current geometry configuration. Both and
are non-linear functions of the parameter vector . We can lin-
earize this equation, and theresulting equation will be valid locally
around . If
(8)
then
(9)
where and is a matrix of size , with as
the number of geometry parameters.
Substituting (9) into (7), we obtain the following equation
(10)
for each pixel and a corresponding reference view.
3.2. Least-Squares Solution
This equation may be derived for all the pixels in the light field
that are to be predicted, and can be combined to form the matrix
equation
(11)
where
(12)
and
(13)
Because our linearized problem and our mapping function
are only valid for small , we must include regularization into
the solution of our problem. This gives us the equation
(14)
where is the regularization constant. A larger value for means
that the solution will be smaller, and the problem will be more
numerically stable. When is too large, however, it takes many
iterations to converge to the solution. In our experiments, the value
for is selected empirically.
This linearized problem can be solved using the least-squares
approach. Since an equation is formed for each pixel predicted
from a reference view, the number of rows of can be large. In
any particular equation, only three parameters are specified, there-
fore is also sparse. We use the LSQR method [10], which is
well-suited to large, sparse problems, in our implementation.
Once we obtain a new geometry model from the solution, we
can again linearize the equations about the new operating point,
and solve for the new change in geometry parameters. We can
iteratively perform these two steps until we converge to the best
geometry model.
Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.
c
2002 IEEE
4. RESULTS
Our experiments use both real and synthetic light field data sets.
An initial approximate geometry model is created using the silhou-
ette information from the light field image data. This silhouette-
reconstructed geometry model is refined using the technique de-
scribed in this paper to obtain the improved geometry model. For
the synthetic light fields, we also have the true geometry models,
which can serve as a useful reference point. We encode the light
fields using each of these geometry models, and compare their rel-
ative rate-PSNR performance. The light field coder is described
next.
4.1. Light Field Coder
The light field coder in our work uses block-based disparity-
compensation both without and with an explicit geometry model
[4, 5]. All images are divided up into
blocks. Each block is
encoded in one of several modes: the INTRA mode, where DCT-
based image compression is used for the block; the GEO mode,
where an explicit geometry model is used to predict the block from
reference images; the STD (standard) mode where a depth value
is specified to predict the block from reference images; and the
COPY mode, where a block from the same image location is sim-
ply copied from the reference image. For the STD mode, the depth
values are quantized such that they correspond to approximately
integer-pixel accuracy in the image plane. In the GEO and STD
modes, a DCT-based residual encoder is used on the prediction er-
ror. Mode selection is based on a rate-distortion Lagrangian cost
function
(15)
where is the sum-squared-error distortion of the block image,
and is the rate in bits for the block. The mode with the small-
est Lagrangian cost is chosen. A rate-PSNR curve is obtained by
varying the image quality, using the quantization parameter in
the DCT intra and residual coders. The Lagrangian multiplier
that is used to trade off rate versus distortion is adjusted according
to the quantization parameter using the following equation com-
monly used in video compression, [11]
(16)
4.2. Experiments
Four data sets were used in our experiments. The first two, Star
and Cube, are synthetic light fields, each with images of reso-
lution . The last two, Garfield29 and Garfield288, are
light fields recorded from a real-world object, the same plush toy.
Garfield29 has images, each of resolution , covering
the frontal region of the object, while Garfield288 has images,
each of resolution covering the entire hemi-sphere of
views.
For each of the data sets, we derive a geometry model that is
consistent with the silhouette from each view. We begin with a
-vertex subdivided icosahedron model that is larger than the
object. In each view, for each vertex, if a vertex lies outside of the
silhouette, we move it radially towards a center point so that it lies
on the silhouette border. We thereby obtain a -vertex object
that matches the silhouette in all views.
For each of the four light fields, we refine this silhouette-
reconstructed geometry object to obtain an improved geometry ob-
ject. Typically, we use anywhere from to iterations and
a regularization constant . Both of these quantities are
determined empirically based on the results.
Figure 1 shows the results of our algorithm for the Star light
field. Figure 1(a) shows the true geometry, Figure 1(b) shows the
silhouette-reconstructed geometry, and Figure 1(c) shows the re-
fined geometry. Figure 2 illustrates the geometry results for the
real-world Garfield29 light field. Figure 2(a) shows only face of
the object from one image of the light field. Figure 2(b) shows
the silhouette-reconstructed geometry, and Figure 2(c) shows the
refined geometry.
(a) True Geo. (b) Silhouette Geo.
(c) Improved Geo.
Fig. 1. Geometry models for the Star light field. The near-exact
constrained geometry (not pictured) is visually identical to the true
geometry.
(a) Light field image
(b) Silhouette Geo.
(c) Improved Geo.
Fig.2. Magnified portion of light field image and geometry models
for the Garfield29 light field.
For all of the light field data sets, we compare the efficiency of
our light field coder using the silhouette-reconstructed geometry
versus the improved geometry. For the two synthetic light fields,
we can compare with the results of the true geometry model as
well. Our algorithm constrains the set of possible improved ge-
ometry outcomes, since it uses only vertices and constrains
the positions of these vertices to be in the same directions as the
original subdivided icosahedron vertices. To understand the possi-
ble effect of these constraints, we create another geometry model
that is subject to these constraints, but fit as close as possible to
the exact geometry model. We call this our near-exact constrained
geometry. This geometry model represents the best possible ge-
ometry result under the constraints that we have placed on the al-
gorithm.
Figures 3 and 4 show the Rate-PSNR curves using the various
geometry models for the Star light field and the Garfield29 light
field, respectively. The bit-rate for the geometry models is not
included. Since we have a regular icosahedron arrangement of
vertices, where only the vertex radii must be specified, this
bit-rate will be negligible compared to the overall bit-rate for the
light field. The PSNR is measured over the entire image.
Due to spaceconsiderations, we do not show the curves for the
other data sets. In all cases, we see a bit-rate reduction of approxi-
mately % usingtheimproved geometry instead of the silhouette-
Proc. International Conference on Image Processing ICIP-2002, Rochester, New York, Sept. 2002.
c
2002 IEEE
reconstructed geometry. This corresponds to an increase of ap-
proximately dB in PSNR. The results for the synthetic data sets
indicate that there still exists a large performance gap between
the improved geometry and the exact geometry. The results for
the near-exact constrained geometry show, however, that only an-
other % is possible using our constrained arrangement. In other
words, our improved geometry realizes % of the gain possible
under our constrained arrangement.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
30
32
34
36
38
40
42
44
46
48
Bit Rate (bits/pixel)
PSNR (dB)
Star Light Field
Silhouette Geometry
Improved Geometry
Near−exact Constrained Geometry
Exact Geometry
Fig. 3. Rate-PSNR for Star Light Field. We see a % bit-rate re-
duction using the improved geometry over the original silhouette-
reconstructed geometry. The near-exact constrained geometry
shows us the best possible result for our constrained arrangement.
There is still a large performance gap from the exact geometry re-
sults.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
34
36
38
40
42
44
46
48
50
Bit Rate (bits/pixel)
PSNR (dB)
Garfield29 Light Field
Silhouette Geometry
Improved Geometry
Fig. 4. Rate-PSNR for Garfield29 Light Field. We see a %
bit-rate reduction using the improved geometry over the original
silhouette-reconstructed geometry for this real-world light field.
5. CONCLUSIONS
We have presented an algorithm to automatically refine the ge-
ometry model used for disparity-compensated light field com-
pression. This improved geometry model reduces the disparity-
compensation prediction error and improves the compression ef-
ficiency. Our experiments show bit-rate savings of approxi-
mately % using the refined geometry model over the silhouette-
reconstructed geometry model. These experiments were per-
formed on both real and synthetic light field data sets.
Results from the synthetic data sets indicate that the algorithm
may be improved significantly by relaxing some of the geometric
constraints in the algorithm.
6. REFERENCES
[1] Marc Levoy and Pat Hanrahan, “Light field rendering, in
Computer Graphics (Proceedings SIGGRAPH96), August
1996, pp. 31–42.
[2] Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and
Michael F. Cohen, “The lumigraph, in Computer Graphics
(Proceedings SIGGRAPH96), August 1996, pp. 43–54.
[3] Marc Levoy, Kari Pulli, et al., “The Digital Michelangelo
project: 3D scanning of large statues,” in Computer Graphics
(Proceedings SIGGRAPH00), August 2000, pp. 131–144.
[4] Marcus Magnor and Bernd Girod, “Data compression for
light field rendering, IEEE Transactions on Circuits and
Systems for Video Technology, vol. 10, no. 3, pp. 338–343,
April 2000.
[5] Marcus Magnor, Peter Eisert, and Bernd Girod, “Model-
aided coding of multi-viewpoint image data, in Proceedings
of the IEEE International Conference on Image Processing
ICIP-2000, Vancouver, Canada, September 2000, vol. 2, pp.
919–922.
[6] Xin Tong and Robert M. Gray, “Coding of multi-view im-
ages for immersive viewing, in Proceedings of the Interna-
tional Conference on Acoustics, Speech, and Signal Process-
ing ICASSP 2000, Istanbul, Turkey, June 2000, vol. 4, pp.
1879–1882.
[7] Marcus Magnor, Geometry-Adaptive Multi-View Coding
Techniques for Image-based Rendering, Ph.D. thesis, Uni-
versity Erlangen-Nuremberg, Germany, 2001.
[8] Peter Eisert, Eckehard Steinbach, and Bernd Girod, Auto-
matic reconstruction of stationary 3-D objects from multiple
uncalibrated camera views, IEEE Transactions on Circuits
and Systems for Video Technology, vol. 10, no. 2, pp. 261–
277, March 2000.
[9] Eckehard Steinbach, Peter Eisert, and Bernd Girod, “Model-
based 3-D shape and motion estimation using sliding tex-
tures, in Proceedings Vision, Modelling and Visualization
2001, Stuttgart, Germany, November 2001.
[10] Christopher C. Paige and Michael A. Saunders, “LSQR:
An algorithm for sparse linear equations and sparse least
squares, ACM Transactions on Mathematical Software, vol.
8, no. 1, pp. 43–71, March 1982.
[11] Gary J. Sullivan and Thomas Wiegand, “Rate-distortion op-
timization for video compression, IEEE Signal Processing
Magazine, vol. 15, pp. 74–90, November 1998.
... A technique has been proposed to automatically refine the geometry using the acquired images [8]. This geometry refinement method is based on optical-flow technique, that minimizes the disparity-compensated prediction error. ...
... Although an improvement of ¢ ¡ % in compression efficiency is observed, a gap of another £ ¡ % from the optimally achievable with our constrained arrangement. In this paper, we extend the geometry estimation method described in [8] in order to improve compression efficiency and overcome its limitations. We base our technique on weighted least squares and multi-resolution processing of the light field images. ...
... State-of-the-art light field compression algorithms [1, 5, 6, 7, 8] use disparity compensation. The underlying idea of disparity-compensated prediction is that a pixel in a light field image can be predicted from corresponding pixels in the reference images. ...
Conference Paper
Geometry-aided light field compression requires accurate geometry for the efficient representation of the light field image data. In this work, we propose a method to directly estimate a geometry model from the light field images, such that it maximizes the compression efficiency. Previous work on this problem uses least-squares to solve a set of optical-flow based equations. This approach suffers from stability problems and gives sub-optimal results, due to outliers included in the set of equations. We propose an extension to this approach that addresses some of its short-comings. Specifically, we use weighted least-squares to identify and suppress the effects of outliers and a multi-resolution estimation algorithm for the geometry model. The experiments performed on real and synthetic data show that our technique stabilizes the algorithm, decreases the total running time by a factor of 10 and decreases bit-rate by up to 10% over the previous work. For the synthetic light field sequences, we show that this achieves the optimal achievable compression efficiency, under our constrained arrangement. 1
... Since light fields can be extremely large, compression is an important problem and has attracted much research interest. In this paper, we use a particular compression method based on disparity compensation [4, 5]. In this approach, images are either predicted from previously encoded reference images, using a geometry model, or encoded independently. ...
... There are 7 levels in the hierarchy for Garfield. We encode this data set at the highest possible quality, using the hierarchical disparity-compensated prediction scheme in [4, 5]. We render from the light field data set using a geometry model. ...
Article
We propose a framework for the streaming of light fields over a lossy error-prone packet network. This system is optimized for the end-user according to a rate-distortion criterion. We build upon recent rate-distortion optimized streaming work for audio and video data. We extend this work to light field image data by introducing view-dependent distortion, multiple playout deadlines, and statebased distortion. In our experimental results, the rate-distortion optimized framework has a bit-rate reduction of up to 75% over a heuristic rule-based system.
... This allows for prediction between images, such as in45678910 , encoding several images jointly by warping them to a common reference frame, as in [11,12,8,13], or, more recently, by combining prediction with lifting, as in [14,15,9]. This paper considers a closed-loop predictive light field coder [16,9] from the first set of techniques. ...
Article
A theoretical framework to analyze the rate-distortion performance of a light field coding and streaming system is proposed. This framework takes into account the statistical properties of the light field images, the accuracy of the geometry information used in disparity compensation, and the prediction dependency structure or transform used to exploit correlation among views. Using this framework, the effect that various parameters have on compression efficiency is studied. The framework reveals that the efficiency gains from more accurate geometry, increase as correlation between images increases. The coding gains due to prediction suggested by the framework match those observed from experimental results. This framework is also used to study the performance of light field streaming by deriving a view-trajectory-dependent rate-distortion function. Simulation results show that the streaming results depend both the prediction structure and the viewing trajectory. For instance, independent coding of images gives the best streaming performance for certain view trajectories. These and other trends described by the simulation results agree qualitatively with actual experimental streaming results.
Conference Paper
We propose a framework for the streaming of light fields over a lossy error-prone packet network. This system is optimized for the end-user according to a rate-distortion criterion. We build upon recent rate-distortion optimized streaming work for audio and video data. We extend this work to light field image data by introducing view-dependent distortion, multiple playout deadlines, and state-based distortion. In our experimental results, the rate-distortion optimized framework has a bit-rate reduction of up to 75% over a heuristic rule-based system.
Conference Paper
Full-text available
The paper presents a new coding technique for images taken from arbitrary recording positions around a static scene, based on the reconstructed object geometry. Such data structures occur in image-based rendering applications where many hundreds to thousands of images need to be stored and transmitted. Approximate scene geometry enables disparity compensation as well as occlusion detection, leading to improved image prediction. Images are coded in hierarchical order to ensure efficient exploitation of inter-image similarities. The 3-D geometry model allows rendering new views by warping recorded images. The presented algorithm is validated using real-world image data sets, achieving better than 1000:1 compression at acceptable reconstruction quality.
Article
Full-text available
An iterative method is given for solving Ax ~ffi b and minU Ax- b 112, where the matrix A is large and sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable numerical properties. Reliable stopping criteria are derived, along with estimates of standard errors for x and the condition number of A. These are used in the FORTRAN implementation of the method, subroutine LSQR. Numerical tests are described comparing I~QR with several other conjugate-gradient algorithms, indicating that I~QR is the most reliable algorithm when A is ill-conditioned. Categories and Subject Descriptors: G.1.2 [Numerical Analysis]: ApprorJmation--least squares approximation; G.1.3 [Numerical Analysis]: Numerical Linear Algebra--linear systems (direct and
Article
Full-text available
A system for the automatic reconstruction of real-world objects from multiple uncalibrated camera views is presented. The camera position and orientation for all views, the 3-D shape of the rigid object, as well as the associated color information, are recovered from the image sequence. The system proceeds in four steps. First, the internal camera parameters describing the imaging geometry are calibrated using a reference object. Second, an initial 3-D description of the object is computed from two views. This model information is then used in a third step to estimate the camera positions for all available views using a novel linear 3-D motion and shape estimation algorithm. The main feature of this third step is the simultaneous estimation of 3-D camera-motion parameters and object shape refinement with respect to the initial 3-D model. The initial 3-D shape model exhibits only a few degrees of freedom and the object shape refinement is defined as flexible deformation of the initial shape model. Our formulation of the shape deformation allows the object texture to slide on the surface, which differs from traditional flexible body modeling. This novel combined shape and motion estimation using sliding texture considerably improves the calibration data of the individual views in comparison to fixed-shape model based camera-motion estimation. Since the shape model used for model based camera-motion estimation is only approximate, a volumetric 3-D reconstruction process is initiated in the fourth step that combines the information from ail views simultaneously. The recovered object consists of a set of voxels with associated color information that describes even fine structures and details of the object. New views of the object can be rendered from the recovered 3-D model, which has potential applications in virtual reality or multimedia systems and the emerging field of video coding using 3-D scene models
Article
Full-text available
A system for the automatic reconstruction of real world objects from multiple uncalibrated camera views is presented. The camera position and orientation for all views, the 3-D shape of the rigid object as well as associated color information are recovered from the image sequence. The system proceeds in four steps. First, the internal camera parameters describing the imaging geometry of the camera are calibrated using a reference object. Second, an initial 3-D description of the object is computed from two views. This model information is then used in a third step to estimate the camera positions for all available views using a novel linear 3-D motion and shape estimation algorithm. The main feature of this third step is the simultaneous estimation of 3-D camera motion parameters and object shape refinement with respect to the initial 3-D model. The initial 3-D shape model exhibits only a few degrees of freedom and the object shape refinement is defined as flexible deformation of the initial shape model. Our formulation of the shape deformation allows the object texture to slide on the surface, which di#ers from traditional flexible body modeling. This novel combined shape and motion estimation using sliding texture considerably improves the calibration data of the individual views in comparison to fixed-shape model-based camera motion estimation. Since the shape model used for model-based camera motion estimation is approximate only, a volumetric 3-D reconstruction process is initiated in the fourth step that combines the information from all views simultaneously. The recovered object consists of a set of voxels with associated color information that describe even fine structures and details of the object. New views of the object can be rendered fr...
Conference Paper
We describe a hardware and software system for digitizing the shape and color of large fragile objects under non-laboratory conditions. Our system employs laser triangulation rangefinders, laser time-of-flight rangefinders, digital still cameras, and a suite of software for acquiring, aligning, merging, and viewing scanned data. As a demonstration of this system, we digitized 10 statues by Michelangelo, including the well-known figure of David, two building interiors, and all 1,163 extant fragments of the Forma Urbis Romae, a giant marble map of ancient Rome. Our largest single dataset is of the David - 2 billion polygons and 7,000 color images. In this paper, we discuss the challenges we faced in building this system, the solutions we employed, and the lessons we learned. We focus in particular on the unusual design of our laser triangulation scanner and on the algorithms and software we developed for handling very large scanned models.
Conference Paper
A light field is a collection of multi-view images which represent a 3-D scene. Rendering from a light field provides a simple and efficient way to generate arbitrary new views of the scene as the viewing position and angle change, thus offering the experience of immersive viewing. The enormous amount of data required in a light field (typically on the order of tens or hundreds of megabytes and in some cases even gigabytes) poses a key challenge in rendering. Tree-structured vector quantization (TSVQ) provides moderate compression ratio, which alleviates but does not solve the problem. Compression schemes based on video coding techniques exploit the data redundancy very effectively but do not provide adequate random access for rendering. This paper describes a hierarchical compression scheme based on disparity compensated prediction, which provides a high compression ratio while offering fast random access. More information is available at http://www-ise.stanford.edu/~xin/lf
Article
The rate-distortion efficiency of video compression schemes is based on a sophisticated interaction between various motion representation possibilities, waveform coding of differences, and waveform coding of various refreshed regions. Hence, a key problem in high-compression video coding is the operational control of the encoder. This problem is compounded by the widely varying content and motion found in typical video sequences, necessitating the selection between different representation possibilities with varying rate-distortion efficiency. This article addresses the problem of video encoder optimization and discusses its consequences on the compression architecture of the overall coding system. Based on the well-known hybrid video coding structure, Lagrangian optimization techniques are presented that try to answer the question: what part of the video signal should be coded using what method and parameter settings?
Article
Two light-field compression schemes are presented. The codecs are compared with regard to compression efficiency and rendering performance. The first proposed coder is based on video-compression techniques that have been modified to code the four-dimensional light-field data structure efficiently. The second coder relies entirely on disparity-compensated image prediction, establishing a hierarchical structure among the light-field images. The coding performance of both schemes is evaluated using publicly available light fields of synthetic, as well as real-world, scenes. Compression ratios vary between 100:1 and 2000:1, depending on the reconstruction quality and light-field scene characteristics
Article
A number of techniques have been proposed for flying through scenes by redisplaying previously rendered or digitized views. Techniques have also been proposed for interpolating between views by warping input images, using depth information or correspondences between multiple images. In this paper, we describe a simple and robust method for generating new views from arbitrary camera positions without depth information or feature matching, simply by combining and resampling the available images. The key to this technique lies in interpreting the input images as 2D slices of a 4D function - the light field. This function completely characterizes the flow of light through unobstructed space in a static scene with fixed illumination.