H.264 BASED CODING OF OMNIDIRECTIONAL VIDEO
ABSTRACT Omnidirectional video is an adequate image-based scene representation data format for interactive walkthroughs or surround
viewing of a scene over the Internet. While efficient compression is a key issue in this context, the properties of omnidirectional
video are not in line with the assumptions that are made about video sequences by modern compression standards like MPEG-4
AVC/H.264. We introduce a preprocessing approach which transforms omnidirectional video into a sequence of panoramic images
before encoding. Using a state of the art MPEG-4 AVC/H.264 video coder, our approach performs up to 2dB better for low bit
rates compared to regular encoding of omnidirectional video.
- SourceAvailable from: Jean-Philippe Tarel
Conference Proceeding: Calibration of panoramic catadioptric sensors made easier[show abstract] [hide abstract]
ABSTRACT: We present a new method to calibrate panoramic catadioptric sensors. While many methods exist for planar cameras, it is not the case for panoramic catadioptric sensors. The aim of the proposed calibration is not to estimate the mirror surface parameters which can be known very accurately, but to estimate the intrinsic parameters of the CCD camera and the pose parameters of the CCD camera with respect to the mirror Unless a telecentric lens is used, this pose must be estimated, particularly for sensors that have a unique effective view point. The developed method is based on the original and simple idea that the mirror external and internal boundaries can be used as a 3D calibration pattern. The improvement introduced by our approach is demonstrated on synthetic experiments with incorrectly aligned sensors and validation tests on real images are described. The proposed technique opens new ways for better designed catadioptric sensors where self-calibration can be easily performed in real-time in a completely autonomous way. In particular this should allow to avoid errors due to vibrations one can notice when using catadioptric sensors in practical situations.Omnidirectional Vision, 2002. Proceedings. Third Workshop on; 02/2002
Conference Proceeding: Visual surveillance and monitoring system using an omnidirectional video camera[show abstract] [hide abstract]
ABSTRACT: This paper describes a visual surveillance and monitoring system which is based on omnidirectional imaging and view-dependent image generation from omnidirectional video streams. While conventional visual surveillance and monitoring systems usually consist of either a number of fixed regular cameras or a mechanically controlled camera, the proposed system has a single omnidirectional video camera using a hyperboloidal mirror. This approach has an advantage of less latency in looking around a large field of view. In a prototype system developed, the viewing direction is determined by viewers' head tracking, by using a mouse, or by moving object trading in the omnidirectional imagePattern Recognition, 1998. Proceedings. Fourteenth International Conference on; 09/1998
- [show abstract] [hide abstract]
ABSTRACT: Traditionally, virtual reality systems use 3D computer graphics to model and render virtual environments in real-time. This approach usually requires laborious modeling and expensive special purpose rendering hardware. The rendering quality and scene complexity are often limited because of the real-time constraint. This paper presents a new approach which uses 360-degree cylindrical panoramic images to compose a virtual environment. The panoramic image is digitally warped on-the-fly to simulate camera panning and zooming. The panoramic images can be created with computer rendering, specialized panoramic cameras or by "stitching" together overlapping photographs taken with a regular camera. Walking in a space is currently accomplished by "hopping" to different panoramic points. The image-based approach has been used in the commercial product QuickTime VR, a virtual reality extension to Apple Computer's QuickTime digital multimedia framework. The paper describes the architecture, the fil...10/1998;
H.264 BASED CODING OF OMNIDIRECTIONAL
Ingo Bauermann, Matthias Mielke and Eckehard Steinbach
Technische Universität München, Institute of Communication Networks, Media Technology
Abstract: Omnidirectional video is an adequate image-based scene representation data
format for interactive walkthroughs or surround viewing of a scene over the
Internet. While efficient compression is a key issue in this context, the
properties of omnidirectional video are not in line with the assumptions that
are made about video sequences by modern compression standards like
MPEG-4 AVC/H.264. We introduce a preprocessing approach which
transforms omnidirectional video into a sequence of panoramic images before
encoding. Using a state of the art MPEG-4 AVC/H.264 video coder, our
approach performs up to 2dB better for low bit rates compared to regular
encoding of omnidirectional video.
Key words: omnidirectional video, video compression
Omnidirectional video systems are capable of capturing a 360-degree
horizontal field of view at one time. This wide view characteristic makes
them suitable for applications like interactive walkthroughs and surveillance
(e.g. [1,2]). Figure 1 shows one frame of an omnidirectional video sequence
and a perspective view generated from one portion of the frame.
This kind of dataset allows pan, tilt, zoom, and even translational
movement along the specific path the sequence was captured on. While the
main advantage of omnidirectional video is the wide field of view, the
specific geometric acquisition properties that lead to it contradict some
assumptions common video compression schemes are based on. To address
this we propose a coding scheme based on preprocessing of the captured
omnidirectional video sequence and subsequent encoding using the state-of-
the-art video codec MPEG-4 AVC/H.264. Figure 2 shows the processing
pipelines for direct encoding of omnidirectional video (DEOV) and coding
of preprocessed omnidirectional video (CPOV) investigated in this paper.
Ingo Bauermann, Matthias Mielke and Eckehard Steinbach
Figure 1. One frame (720x576) from the
“classroom” test sequence. The upper right
shows a perspective mapping from the
original ominidirectional image.
In the following some issues for the DEOV approach are given that motivate
our CPOV approach. The number of reference pixels in the omnidirectional
view used to interpolate a pixel in the perspective view is a function of
image coordinates (u,v) in Figure 1. Depending on the shape of the mirror
and the lenses used for capturing the omnidirectional video, the introduced
distortion in the final view is inhomogeneous. For DEOV blocking-artifacts
caused by rate-constrained compression of the omnidirectional video result
in jagged vertical edges. The distribution and shape of blocking-artifacts
change for different viewing directions (see Figure 8). Motion estimation
and compensation performed in standard video coders assume translational
motion of small image blocks from one video frame to another. For
omnidirectional video this assumption does not hold. Motion of the camera
or in the scene results in a change of orientation and scaling of
corresponding image blocks in the omnidirectional image. In addition,
reconstructing perspective views directly from captured omnidirectional
frames is computationally intense. Projection properties are often non-linear
and depend on the viewing direction.
To adapt to some extent to these properties, the proposed coding scheme
CPOV performs prewarping and subsampling steps on the captured sequence
before encoding. The resulting panoramic sequence is resampled in vertical
and horizontal direction before it is fed to the video encoder.
The catadioptric imaging system used in this work uses a hyperbolic
mirror and lens system from “Remote Reality” . For the omnidirectional
Figure 2. Processing pipelines (left) for direct
encoding of omnidirectional video sequences
(DEOV) and (right) for coding of
preprocessed omnidirectional video (CPOV).
H.264 based coding of omnidirectional video
video sequences captured with this device a computationally complex non-
linear projection is needed to generate perspective views.
The remainder of the paper is structured as follows. In Chapter 2 we
describe the prewarping and subsampling scheme for the preprocessing step.
Chapter 3 discusses the impact on the coding efficiency and rendering
complexity of our approach. In Chapter 4 experimental results are presented.
Chapter 5 gives concluding remarks.
The calibration of camera and mirror parameters is performed using the
algorithm described in  and . The pre-processing step consists of two
main parts. First a panoramic image is generated from the captured
omnidirectional frame using bilinear interpolation. Figure 3 shows a
panoramic image created from the “classroom sequence”. The mapping from
the omnidirectional image plane to the panorama is illustrated in Figure 5.
Pixels of size A
are mapped to an area P
mirror surface. The area ratio between the two representations is given by
on the panorama plane via the
( , )
u s t
v s t
( , )
u s t
v s t
( , )( , )
Figure 4 shows θ as a function of t for a panorama size of 2500x343 pixels.
Figure 3. Panoramic image obtained after
prewarping the omnidirectional image.
Figure 4. Number of output pixels per
input pixel Ө2500x343 for a given panorama
size of 2500x343.
Figure 5. Projection from the image plane to
the panorama plane.
height of focal point
area on the
pixel A of size
The resampling step scales the image dimensions independently in
horizontal and vertical direction. Let
dimension and the t dimension, respectively. Then
is the output pixel per input pixel ratio after the pre-processing step.
Parameters ss and st are chosen by optimizing the rate distortion trade off for
the rendered view. The resulting sequence is fed to the video encoder.
ts be the scaling factor for the s
IMPACT OF PREPROCESSING ON THE CODING
The preprojection and resampling steps in CPOV allow us to adjust
spatial resolution and quality in different areas of the rendered view
reconstructed from the panoramic image without changing the quantization
policy of the used standard video coder. The motivation for this is that the
rate-distortion optimization algorithms of the video coder can not take into
account that warping steps have to be performed after decoding to generate
perspective views. Spatial adaptation of the quantization parameters would
solve this problem. This would mean that the encoder has to be modified in
such a way that quantization parameters are adjusted during the encoding
process according to the current image coordinates. Though no modification
of the decoder is needed in this case the client side viewing application has
to be aware of the intrinsic parameters of the specific capturing device. In
CPOV a general panoramic representation for omnidirectional video is used.
This representation is independent of the actual capturing device.
In the rendered view blocking boundaries have constant shape due to the
panoramic representation in CPOV. For DEOV the block size increases from
top to bottom. For the same reasons motion compensation performs better on
the preprojected frame as the motion model used in most standard coders
assumes translational motion of image blocks from one frame to the next.
The preprojection undistorts the trajectories of linearly moving objects to
some extent and compensates for scaling effects.
In CPOV rendering perspective views from the compressed
omnidirectional video becomes less complex as the mapping from the image
plane ( , )
x y to the normalized panoramic image coordinates ( , )
height t0 of the focal point in the panoramic image and the focal length of the
virtual camera fv simplifies to:
s t given the
H.264 based coding of omnidirectional video
For CPOV the mapping from the panoramic image coordinates ( , )
the omnidirectional image coordinates ( , )
performed during the preprocessing step:
1-+sin 2 1-
i uo u
utc t cst
s t to
u v and mirror calibration is
()( ) (
)( ) ( )
()( ) (
) ( )( )
1- +cos 21-
i vo v
i vo vio
P tt P t
vtc t cst P tt P t
=⋅+ ⋅⋅⋅⋅ + ⋅
=⋅ + ⋅⋅
Here ci,u, ci,v, co,u, co,v and ri,u, ri,v, ro,u, ro,v denote the centers and radii of the
inner and outer calibration circles (see ). Pi
polynoms calibrating the radial distortion. From the applications point of
view the representation proposed here is independent from the geometrical
assembly and calibration of the capturing device.
5 denote 5th order
To evaluate the performance of CPOV a viewer has been developed
capable of rendering identical views from the originally captured
omnidirectional frame and from reconstructed frames encoded using CPOV
and DEOV. The window size is 640x480 pixels. The test sequence is
“classroom” a 40 frame video captured with an omnidirectional camera from
Remote Reality . Figure 6 shows the rate distortion plot for both CPOV
and DEOV. Rendered views from the reconstructed frames are compared to
the view rendered from the uncompressed original omnidirectional frame.
The rate is measured in bit per rendered pixel where the file sizes of the
compressed representations are weighted by the number of rendered pixels
in the field of view of the virtual camera. For low rates the gain of CPOV
over DEOV is up to 2dB. Figure 7 shows a similar rate distortion plot for the
lower third of the rendered view. Up to 3dB better PSNR is achieved using
CPOV. The performance of the viewer was tested using the video for
windows interface and an XviD codec version 1.0.0 RC2. The frame rate for
navigating frame by frame was about 30% faster for CPOV.
In this paper a preprocessing technique was presented for encoding
omnidirectional video sequences efficiently (CPOV). The pre-processed
omnidirectional video sequences are encoded using the state-of-the-art video
coder MPEG-4 AVC/H.264. By prewarping and resampling the captured
frame to a panoramic image an adaptation to the assumptions standard video
coders are based on was investigated. A better tradeoff between quality in