Conference PaperPDF Available

Squaring the Circle in Panoramas

Conference Paper

Squaring the Circle in Panoramas

Abstract and Figures

Pictures taken by a rotating camera cover the viewing sphere surrounding the center of rotation. Having a set of images registered and blended on the sphere what is left to be done, in order to obtain a flat panorama, is projecting the spherical image onto a picture plane. This step is unfortunately not obvious - the surface of the sphere may not be flattened onto a page without some form of distortion. The objective of this paper is discussing the difficulties and opportunities that are connected to the projection from viewing sphere to image plane. We first explore a number of alternatives to the commonly used linear perspective projection. These are 'global' projections and do not depend on image content. We then show that multiple projections may coexist successfully in the same mosaic: these projections are chosen locally and depend on what is present in the pictures. We show that such multi-view projections can produce more compelling results than the global projections
Content may be subject to copyright.
in: Proceedings of the Tenth IEEE International Conference on Computer Vision, pp. 1292-1299, Beijing, China,
October 15-21, 2005.
Squaring the Circle in Panoramas
Lihi Zelnik-Manor
Gabriele Peters
Pietro Perona
1. Dept. of Electrical Engineering Califormia Institute of Technology Pasadena, CA 91125, USA
2. Informatik VII (Graphische Systeme), Universitat Dortmund, Dortmund, Germany
Pictures taken by a rotating camera cover the viewing
sphere surrounding the center of rotation. Having a set of
images registered and blended on the sphere what is left to
be done, in order to obtain a at panorama, is projecting
the spherical image onto a picture plane. This step is unfor-
tunately not obvious – the surface of the sphere may not be
flattened onto a page without some form of distortion. The
objective of this paper is discussing the difficulties and op-
portunities that are connected to the projection from view-
ing sphere to image plane. We first explore a number of al-
ternatives to the commonly used linear perspective projec-
tion. These are ‘global’ projections and do not depend on
image content. We then show that multiple projections may
coexist successfully in the same mosaic: these projections
are chosen locally and depend on what is present in the pic-
tures. We show that such multi-view projections can pro-
duce more compelling results than the global projections.
1. Introduction
As we explore a scene we turn our eyes and head and cap-
ture images in a wide field of view. For millennia painters
and (more recently) photographers have grappled with the
problem of creating pictures that render the visual impres-
sion of ‘being there’. Recent advances in storage, com-
putation and display technology have made it possible to
develop ‘virtual reality’ environments where the user feels
‘immersed’ in a virtual scene and can explore it by mov-
ing within it. However, the humble still picture, painted
or printed on a flat surface, is still a popular medium: it
is inexpensive to reproduce, easy and convenient to carry,
store and display. Even more importantly, it has unrivaled
size, resolution and contrast. Furthermore, the advent of in-
expensive digital cameras, their seamless integration with
computers, and recent progress in detecting and matching
informative image features [4] together with the develop-
ment of good blending techniques [7, 5] have made it possi-
ble for any amateur photographer to produce automatically
mosaics of photographs covering very wide fields of view
and conveying the vivid visual impression of large panora-
mas, something that so far was the exclusive preserve of
the artist. Such mosaics are superior to panoramic pictures
taken with conventional fish-eye lenses in many respects:
they may span wider fields of view, they have unlimited
resolution, they make use of cheaper optics and they are not
restricted to the projection geometry imposed by the lens.
The geometry of single view point panoramas has long
been well understood [12, 21]. This has been used for mo-
saicing of video sequences (e.g., [13, 20]) as well as for ob-
taining super-resolution images (e.g., [6, 23]). By contrast
when the point of view changes the mosaic is ‘impossible’
unless the structure of the scene is very special. Let’s ex-
plore for a moment the ‘easy’ case, where all pictures share
the same center of projection C. If we consider the viewing
sphere, i.e. the unit sphere centered in C, we may identify
each pixel in each picture with the ray connecting C with
that pixel and passing through the surface of the viewing
sphere, as well as through the physical point in the scene
that is imaged by that pixel. By detecting and matching vi-
sual features in different images we may register automat-
ically the images with respect to each other. We may then
map every pixel of every images we collected to the corre-
sponding point of the viewing sphere and obtain a spheri-
cal image that summarizes all our information on the scene.
This spherical image is the most natural representation: we
may represent this way a scene of arbitrary angular width
and if we place our head in C, the center of the sphere, we
may rotate it around and capture the same images as if we
were in the scene.
What is left to be done, in order to obtain our panorama-
on-a-page, is projecting the spherical image onto a picture
plane. This step is unfortunately not obvious the surface
of the sphere may not be flattened onto a page without some
form of distortion. The choice of projection from the sphere
to the plane has been dealt with extensively by painters and
cartographers. An excellent review is provided in [9].
The best known projection is linear perspective (also
called ‘gnomonic’ and ‘rectilinear’). It may be obtained by
projecting the relevant points of the viewing sphere onto a
tangent plane, by means of rays emanating from the cen-
ter of the sphere C. Linear perspective became popular
amongst painters during the Renaissance. Brunelleschi is
credited with being the first to use correct linear perspec-
tive. Alberti wrote the first textbook on linear perspective
describing the main construction methods [1]. It is believed
by many to be the only ‘correct’ projection because it maps
lines in 3D space to lines on the 2D image plane and be-
cause when thepicture is viewedfrom one special point, the
‘center of projection of the picture, the retinal image that is
obtained is the same as when observing the original scene.
A further, somewhat unexpected, virtue is that perspective
pictures look ‘correct’ even if the viewer moves away from
the center of projection, a very useful phenomenon called
‘robustness of perspective’ [18, 22].
Unfortunately, linear perspective has a number of draw-
backs. First of all: it may only represent scenes that are
at most 180
wide: as the field of view becomes wider,
the area of the tangent plane dedicated to representing one
degree of visual angle in the peripheral portion of the pic-
ture becomes very large compared to the center, and even-
tually becomes unbounded. Second, there is an even more
stringent limit to the size of the visual field that may be
represented successfully using linear perspective: beyond
widths of 30
architectural structures (parallelepipeds)
appear to be distorted, despite the fact that their edges are
straight [18, 14]. Furthermore, spheres that are not in the
center of the viewing field project to ellipses onto the image
plane and appear unnatural and distorted [18] (see Fig 1). A
similar phenomenon affects cylinders. Renaissance painters
knew of these shortcomings and adopted a number of cor-
rective measures [14], some of which we will discuss later.
The objective of this paper is discussing the difficulties
and opportunities that are connected to the projection from
viewing sphere to image plane, in the context of digital im-
age mosaics. We first explore a number of alternatives to
linear perspective which were developed by painters and
cartographers. These are ‘global’ projections and do not
depend on image content. We explore experimentally the
tradeoffs of these projections: how they distort architec-
ture and people and how well do they tolerate wide fields
of view. We then show that multiple projections may co-
exist successfully in the same mosaic: these projections are
chosen locally and depend on what is seen in the pictures
that form the mosaic. We conclude with a discussion of the
work that lies ahead.
In this paper we do not address issues of image regis-
tration and image blending and instead rely on the code by
Brown and Lowe [4, 2] for our experiments.
Figure 1: Perspective distortions. Left: Five photographs
of the same person taken by a rotating camera, after rec-
tification (removing spherical lens distortion). Right: An
overlay of the five photographs after blackening everything
butthe person’sface. This shows that spherical objects look
distorted under perspective projection even at mild viewing
angles. For example, in the above figure, the centers of the
faces in the corners are at 20
horizontal eccentricity.
2 Global Projections
What are the alternatives to linear perspective?
An important drawback of linear perspective is the ex-
cessive scaling of sizes at high eccentricities. Consider
a painter taking measurements in the scene by using her
thumb and using these measurements to scale objects on
the canvas. She takes angular measurements in the scene
and translates them into linear measurements onto the can-
vas. This construction is called Postel projection [9]. It
avoids the ‘explosion’ of sizes in the periphery of the pic-
ture. Along lines radiating from the point where the picture
plane touches the viewing sphere, it actually maps lengths
on the sphere to equal lengths in the image. Lines that run
orthogonal to those (i.e., concentric circles around the tan-
gent point) will be magnified at higher eccentricities, but
much less than by linear perspective. The Postel projection
is close to the cartographic stereographic projection. The
stereographic projection is obtained by using the pole oppo-
site to the point of tangency as the center of projection.
Consider now the situation in which we wish to repre-
sent a very wide field of view. A viewer contemplating a
wide panorama will rotate his head around a vertical axis in
order to take in the full view. Suppose now that the view
has been transformed into a flat picture hanging on a wall
and consider a viewerexploring that picture: the viewerwill
walk in front of the picture with a translatory motion that is
parallel to the wall. If we replace rotation around a vertical
axis with sideways translation in front of the picture we ob-
tain a family of projections which are popular with cartog-
raphers. Wrap a sheet of paper around the viewing sphere
forming a cylinder that touches the sphere at the equator.
One may project the meridians onto the cylinder by main-
taining lengths along vertical lines, thus obtaining the ge-
ographic projection. Alternatively, one may want to vary
locally the scale of the meridians so that they keep in pro-
Perspective Geographic Mercator Transverse Mercator Stereographic
Figure 2: Spherical projections. Figures taken out of Matlab’s help pages visualizing the distortions of various projections.
Grid lines correspond to longitude and latitude lines. Small circles are placed at regular intervals across the globe. After
projection, the small circles appear as ellipses (called Tissot indicatrices) of various sizes, elongations, and orientations.
The sizes and shapes of the ellipses reflect the projection distortions.
portion with the parallels. This is the Mercator projection
(for mathematical definitions of these projections see [16]).
Figure 2 visualizes the properties of these projections. In
this visualization grid lines correspond to longitude and lat-
itude lines. When projecting images onto the sphere, verti-
cal lines are projected onto longitude lines. Horizontal lines
are not projected onto latitude lines but rather onto tilted
great circles, thus the visualization of the latitude lines does
not convey what happens to horizontal image lines. All of
these projections are global and are independent of the im-
age content.
Figure 3 illustrates the above projections on a panorama
constructed of images taken at an indoor scene. This is a
typical example of panoramas of man-made environments
which usually contain many straight lines. Selecting from
the above projections implies bending either the horizon-
tal lines, the vertical lines, or both. In most cases a bet-
ter choice is to keep vertical lines straight as this results in
a panorama where narrow vertical slits look correct. This
matches the observations in [22], which shows that our per-
ception of a picture is affectedby the fact that normally peo-
ple shift their gaze horizontally and rarely shift it vertically.
Shifting one’s gaze horizontally across a panorama looks
best when vertical lines are not bent. This motivates the
use of either the Geographic or the Mercator projections, as
both keep vertical lines straight. In both these projections
the rotation of the camera is transformed into sideways mo-
tion of the observer.
When the camera performs mostly pan motion, i.e.,
when the vertical angle is small, both projections produce
practically the same result. However, for larger tilt an-
gles the Geographic projection distorts circles, i.e., it does
not maintain correct proportions, while the Mercator does
maintain conformality, thus the Mercatorprojectionis a bet-
ter option (see Figure 4). Note, that the conformality im-
plies that in the Mercator projection spherical and cylindri-
cal objects, such as people, are not distorted but the back-
ground is, see for example Figure 8.
An important issue in all cylindrical projections is the
choice of equator. Once the images are on the sphere one
can rotate thesphere in any desired way before projecting to
the plane. In other words, the cylinder wrapping the sphere
can touch the sphere along an equator of choice. When a
wrong equator is selected, vertical lines in 3D space will
not be projected onto vertical lines in the panorama (see left
panel of Figure 5). Finding the correct equator is easy. The
user is requested to mark a single vertical line and a horizon
point in one (or two) of the input images. The sphere is
then rotated so that projection of the marked vertical line
aligns with a longitude line and the equator goes through
the selected horizon point. This results in a straightened
panorama, see for example, right panel of Figure 5.
Should other projections be considered? Yes, we think
so. The Transverse Mercator projection is known in
the mapping world as an excellent choice for mapping ar-
eas that are elongated north-to-south. This corresponds to
panoramas with little pan motion and large tilt motion. The
bending of vertical lines is small near the meridian, thus,
when the pan angle is small we are better off using the
Transverse Mercator projection which keeps the horizontal
lines straight. This is illustrated in Figures 4, 6.
Forfar awayoutdoorsscenes almost anyprojectionlooks
good as the scenes rarely contain any straight lines. Never-
theless, too much bending might disturb the eye even on
free form objects like clouds. This implies the usage of the
stereographicprojection,which bends both vertical and hor-
izontal lines but less than the cylindrical projections.
3 Multi View Projection
The projections explored in Section 2 are ‘global’, in that
once a tangent point or a tangent line is chosen, the pro-
jection is completely determined by this parameter. This is
by no means a necessary property for a good projection. We
may instead tailor the projection locally to thecontent of the
images in order to improve the final effect. We next explore
a few options for such multi-view projections.
Perspective Transverse Mercator
Mercator Stereographic
Geographic Multi-Plane
Figure 3: Spherical projections. There are many spherical projections. Each has its pros and cons.
Figure 4: Preserving proportions. In the Geographic pro-
jection the circular pot at the bottom of the panorama is
distorted into an ellipse. In the Mercator projection this
does not happen.
3.1 Multi-Plane Perspective Projection
As was shown in Section 2, a global projection of wide
panoramas bends lines, which is unpleasant to the eye. To
obtain both a rectilinear appearance and a large field ofview
we suggest using a multi-plane perspectiveprojection. Such
multi-plane projections were suggested by Greene [11] for
rendering textured surfaces. Rather than projecting the
sphere onto a single plane, multiple tangent planes to the
sphere are used. Each projection is linear perspective. The
tangent planes have to be arranged so that they may be un-
folded into a flat surface without distortion, e.g., the points
of tangency belong to a maximal circle. One may think
of the intersections of the tangent planes being fitted with
hinges that allow flattening. The projection onto each plane
is perspective and covers only a limited field of view, thus it
is pleasant to the eye.
This process introduces large orientation discontinuities
at the intersection between the projection planes, however,
in many man-made environment these discontinuities will
not be noticed if they occur along natural discontinuities.
The tangent planes must therefore be chosen in a way that
fits the geometry of the scene, e.g. so that the vertical edges
of a room project onto the seams and each projection plane
corresponds to a single wall. Orientation discontinuities
caused by the projection this way co-occur with orientation
discontinuities in the scene and therefore they are visually
unnoticeable (see Figures 3, 8, 6). Sometimes no seam may
be found that completely corresponds to discontinuities in
the scene: for example in Figure 9 the chair on the right
is clearly distorted. Another caveat is that some arrange-
ments will cause a loss in the impression of depth: for ex-
ample, when projecting a panoramaof a standard room onto
a square prism (see left panel of Figure 7). Most often the
sensation of depth can be maintained by appropriate choice
of the projection planes (see right panel of figure 7).
We have currently implemented a simple user inter-
face to allow choosing the position of the multiple tan-
gent planes. We assume that the hinges between tangent
Mercator With Wrong Equator Mercator With Correct Equator
Figure 5: Choice of equator Panoramas of the Pantheon. A wrong choice of the equator results in tilted vertical lines. The
columns on the right and left appear converging. Correcting the equator selection results in columns standing up-right.
Perspective Geographic Mercator Mercator Multi-Plane
Figure 6: Verticalpanoramas. Left and right panelsshow results beforeand after cropping(see Section 4 for further details).
For wide angle panoramas, perspective cannot capture the full range, thus the photographerslegs are excluded. Geographic
distorts proportions (see how squashed the legs look). Mercator stretches the legs across the bottom. Transverse-Mercator
captures both the sculpture and the photographer which suggests it is the best global projection option for narrow vertical
panoramas. Multi-Plane does even better.
planes are either associated to vertical or horizontal lines:
the user is presented with the Geographic projection of the
panorama and clicks once anywhere on a single vertical line
to choose a seam and once again to choose the point of tan-
gency of each projection plane. Automating this operation
is an interesting exercise which we leave for the future.
3.2 Preserving Foreground Objects
The multi-plane perspective projection takes us back to the
second challenge presented in Section 1. Recall, that even
for small fields of view nearby (foreground) objects are of-
ten perceived as distorted. Our solution to this problem
draws its inspiration from the Renaissance artists.
During the Renaissance the rules of perspective were un-
derstood, and linear perspective was used to produce pic-
tures that had a realistic look. Painters noticed earlier on,
that spheres and cylinders (and therefore people) would ap-
pear distorted if they were painted according to the rules of
a global perspective projection (a sphere will project to an
ellipse). It thus became common practice to paint people,
spheres and cylinders by using linear perspective centered
around each object. (see for example the The School of
Athens by Raphael [18, 14]). This results in paintings with
multiple view points. There is one global view point used
for the background and an additional view point for each
foreground person/object.
Renaissance paintings look good precisely because they
are constructed using a multiplicity of projections. Each
projection is chosen in order to minimize the apparent dis-
tortion of either the ambient architecture, or of a specific
person/object. We follow this example and adopt the multi-
view point approach to construct realistic looking panora-
mas. We first separate the background and foreground ob-
jects. A panorama is constructed from the background by
using a global projection: perspective for fields of view that
are narrower than, say, 40
and Multi-Plane otherwise. The
foreground objects are projected using a ‘local’ perspec-
tive projection, with a central line of sight going through
the center of each object, and then they are pasted onto the
background. More in detail:
(1) Obtain a foreground-background segmentation for
each image and cut out the foreground objects [15, 19].
Currently we use the GIMP [10] implementation of In-
telligent Scissors [17] which requires manual interac-
tion, we found it to take less than a minute per image.
(2) Fill in the holes in the background caused by cutting
out the foreground objects using a texture propagation
technique (e.g., [8, 3]). We used our implementation of
[8]. Note, that the hole filling need notbe perfect as most
of it will be covered eventually by the repasting of the
foregroundobjects. As we are most sensitive to people’s
distortions, one could acquire each picture containing a
person a second time, once the person moved. In that
case hole filling won’t be required.
(3) Construct a panorama of the filled background im-
(4) Overlay foregroundobjects on top of the background
panorama. For each foregroundobject, find its bounding
box in the original image and in the panorama if it were
projected along with the background. Rescale the cut-
out object to have the same height as its projection (note,
that the width will be different). Paste the object so that
the centers of the bounding boxes align.
This process is illustrated in Figure 10. Five frames were
taken out of a video sequence showing a child walking from
right to left, while facing the camera. The child was cut-out
from each image, texture propagation was used to fill in the
holes and a perspective panorama of the background was
constructed (see Figure 10 top). The cut-outs of the child
were then pasted onto the background in two ways. Once
applying the same perspective projection used for the back-
ground, which resulted in distorting the childs head into a
variety of ellipsoidal shapes (see Figure 10 middle). Then
using the multi-view approach described above which pro-
duced a significantly better looking result, removing all the
head distortions, see Figure 10 bottom. Another example
is displayed in Figure 11 (for this example we had avail-
able clear background images so hole filling was not re-
quired). Figure 9 displays our full solution including both
multi-plane projection for the background and multi-view
projection to correct the chair in the foreground.
4 Results
In all the experiments displayed in this paper the compu-
tation of the transformations of the input images to the
sphere was done using Matthew Browns Autostitch soft-
ware [4, 2].
When the images do not cover the full viewing sphere,
the boundaries of the panorama can have all sort of shapes,
depending on the projection, e.g., see left panel of Fig-
ure 6. Thus, for visualization purposes, the panoramas were
cropped to display a complete rectangular portion. This re-
sults in different coverage areas for each projection. The
uncropped panoramas, as well as more results are provided
in the attached supplemental material.
5 Discussion & Conclusions
The challenge of constructing panoramas from images
taken from a single viewpoint goes beyond image matching
and image blending. The choice of the mapping between
the viewing sphere and the image plane is an interesting
problem in itself. Artists and cartographers have explored
this problem and have proposed a number of useful global
projections. Additionally, artists have developed a practice
to use multiple local projections which are guided by the
content of the images. Inspired by the artists we have pro-
posed a new set of projections which incorporate multiple
local projections with multiple view points into the same
panorama to produce more compelling results. Further au-
tomating this process is a worthwhile challengefor machine
vision researchers.
6 Acknowledgements
This reseearch was supported by MURI award number
AS3318 and the Center of Neuromorphic Systems Engi-
neering award EEC-9402726. We also wish to acknowl-
edge our useful conversationswith PatHanrahan, Jan Koen-
derink, Marty Banks, Bill Freeman, Ged Ridgway and
David Lowe and to thank Matthew Brown for providing his
Autostitch software.
[1] Leon Battista Alberti. On Painting. First appeared 1435-36.
Translated with Introduction and Notes by John R. Spencer.
New Haven: Yale University Press. 1970.
[2] Autostitch.
[3] M. Bertalmo, G. Sapiro, V. Caselles, and C. Ballester. Im-
age inpainting. In Proceedings of SIGGRAPH, New Orleans,
USA, July 2000.
[4] M. Brown and D. Lowe. Recognising panoramas. In Pro-
ceedings of the 9th International Conference on Computer
Vision, volume 2, pages 1218–1225, Nice, October 2003.
[5] P. J. Burt and Edward H. Adelson. A multiresolution spline
with application to image mosaics. ACM Trans. Graph.,
2(4):217–236, 1983.
[6] D. Capel and A. Zisserman. Automatic mosaicing with
super-resolution zoom. In CVPR ’98: Proceedings of the
Straight Projection Oblique Projection
Figure 7: Multi-Plane projection. In each panel the top figure displays the geographic projection and the interaction
required by the user - definition of the intersection lines between the tangent planes (marked in blue) and the center of
projection for each tangent plane (marked in green and red). The middle panel displays a top view of the projection. The
bottom panel displays the final result.
IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, page 885. IEEE Computer Society,
[7] P. E. Debevec and J. Malik. Recovering high dynamic range
radiance maps from photographs. In Proceedings of SIG-
GRAPH, August 1997.
[8] A.A. Efros and Thomas K. Leung. Texture synthesis by non-
parametric sampling. In IEEE International Conference on
Computer Vision, pages 1033–1038, Corfu, Greece, Septem-
ber 1999.
[9] A. Flocon and A. Barre. Curvilinear Perspective, From Vi-
sual Space to the Constructed Image. University of Califor-
nia Press, 1987.
[10] The GIMP.
[11] N. Greene. Environment mapping and other applications of
world projections. IEEE Computer Graphics and Applica-
tions, 6(11):21–29, November 1986.
[12] R. I. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge University Press, 2000.
[13] M. Irani, B. Rousso, and S. Peleg. Computing occluding
and transparent motions. Int. J. Comput. Vision, 12(1):5–16,
[14] M. Kubovy. The Psychology of Perspective and Renissance
Art. Cambridge University Press, 1986.
[15] Y. Li, J. Sun, C.K. Tang, and H. Shum. Lazy snapping. In
Proceedings of SIGGRAPH, 2004.
[16] MathWorld.
[17] E.N. Mortensen and W.A. Barrett. Intelligent scissors for
image composition. In SIGGRAPH ’95: Proceedings of the
22nd annual conference on Computer graphics and interac-
tive techniques, pages 191–198. ACM Press, 1995.
[18] M. H. Pirenne. Optics, Painting & Photography. Cambridge
University Press, 1970.
[19] C. Rother, V. Kolmogorov, and A. Blake. Grabcut - inter-
active foreground extraction using iterated graph cuts. Proc.
ACM Siggraph, 2004.
[20] H. S. Sawhney and R. Kumar. True multi-image alignment
and its application to mosaicing and lens distortion correc-
tion. IEEE Trans. Pattern Anal. Mach. Intell., 21(3):235–
243, 1999.
[21] R. Szeliski and H. Shum. Creating full view panoramic im-
age mosaics and environment maps. Computer Graphics,
31(Annual Conference Series):251–258, 1997.
[22] D. Vishwanath, A. R. Girshick, and M. S. Banks. Why pic-
tures look right when viewed from the wrong place. Person-
nal communication. (Manuscript accepted for publication).
[23] A. Zomet and S. Peleg. Applying super-resolution to
panoramic mosaics. In WACV ’98: Proceedings of the
4th IEEE Workshop on Applications of Computer Vision
(WACV’98), page 286. IEEE Computer Society, 1998.
Figure 8: Architecture vs. spherical objects. The perspective projection distorts people at large viewing angles. The
Mercator projection keeps the people undistorted, but distorts the wall and white-board at the background. The Multi-Plane
projection provides the most compelling result with no noticeable distortions in both background and people.
Figure 10: Correcting perspective distortions. Top:
Panorama of the background only. Artifacts in the hole
filling are visible, but are inessential as they will be even-
tually covered by the foreground object. Center: A global
perspective projection of both background and foreground.
The child’s head appears distorted. Bottom: A multi-view
point panorama providing the most compelling look with no
head distortions.
Mercator Multi-Plane Multi-Plane Multi-View
Figure 9: Multi-Plane Multi-View. The multi-plane projection rectified the background but the chair on the right got
distorted. Using the Multi-View approach the chair is undistorted.
Perspective Multi-View
Figure 11: Correcting perspective distortions. In the Perspective panorama the person’s head is highly distorted. A
Multi-view panorama provides a more compelling look, removing all distortions.

Supplementary resource (1)

... The adoption by Renaissance painters of local perspective projections to avoid the stretching caused by one single perspective inspired the authors of [8] to generate similar depictions in the context of computer generated images. The method in [9] also used narrower field of view projections but to rectify mainly indoor scenes. Both these techniques are limited by how wide the field of view of each projection is. ...
... Both these techniques are limited by how wide the field of view of each projection is. Although we use a combination of projections that resembles the one proposed in [9], our method overcomes this field of view limitation by using Möbius transformations instead of perspective projections. ...
... These schemes are mostly determined in terms of bidimensional regions (cages) in the domain and do not have the flexibility we need to deal with points, straight lines and polygons on the sphere. On the other hand, the result of the method in [9] with piecewise constant weights clearly shows the different projections on the buildings (highlighted in red). Ticks on the bottom indicate the vertical transitions between projections. ...
Full-text available
We propose a new method to correct distortions in equirectangular images, i.e., images that use the longitude/latitude representation to describe the full spherical field of view around a given viewpoint. We show that Möbius transformations are the correct mathematical tool to deal with the conflicting distortions in this setting: they are conformal, are able to rectify lines, perform rotations, translations and scales on the sphere and are bijective. Multiple transformations are specified through points, lines and cages on the sphere. We associate three points that uniquely define a Möbius transformation to each one of these geometric handles. Linear blend skinning with bounded biharmonic weights combine the different transformations into a single full spherical modified image. We present a collection of results in challenging settings and show that our method is more flexible and produces higher quality results when compared to previous methods.
... Photography-based methods to quantify green visual exposure W. Wang et al., 2019;Yang et al., 2009), here used to validate and assess the developed method, are based on the assumption that the amount of greenery measured from photographs reflects the amount of greenery observed by people. However, researchers disagree on the extent to which photographs really capture what people see (compare e.g., Aoki et al. (1985) and Falfán et al. (2018)), for instance due to observer characteristics (Falfán et al., 2018) or photograph distortion due to lens settings and panorama projections (Aoki et al., 1985;Zelnik-Manor et al., 2005). The values of visual exposure to urban greenery measured from photographs or modelled by r.viewshed.exposure, ...
Full-text available
Urban trees have the potential to mitigate urban environmental problems caused by climate change and urbanization, and to increase the resilience of cities by delivering a range of ecosystem services. Spatial context, i.e. the location of trees in relation to structures and processes in their surroundings, has been recognized as an important mediator of the ecosystem services that trees provide. Geographical information systems (GIS) are useful for accurate, objective and efficient modelling of the spatial context of urban trees and for bringing this knowledge to urban forestry, urban planning, and other application areas aiming to support the ecosystem services of urban trees. However, the current use of GIS methods in modelling spatial context is limited. For more complex spatial contextual factors, suitable modelling methods are often lacking because such methods have not been developed, or the developed methods are not transferable beyond the study-specific scopes. The absence of spatial modelling methods might lead to disregarding spatial context and, in turn, inadequate accounting for the ecosystem services of urban trees relevant to a wide range of application purposes, including strategic tree planting, urban ecosystem accounting and public awareness-raising. This thesis addressed this research gap by developing GIS methods for modelling selected measures of the spatial context of urban trees, here referred to as spatial contextual factors. First, the thesis reviewed and synthesized the fragmented literature on the specific spatial contextual factors that mediate ecosystem services of urban trees. The review identified 114 specific factors that together mediate 31 ecosystem services of urban trees and clarified the conceptual understanding of spatial context in ecosystem services of urban trees. This, in turn, helped to justify the selection of the specific factors for which new GIS methods were developed and provided a conceptual guidance on approaching the modelling tasks. Second, the thesis developed five GIS methods for modelling selected spatial contextual factors that currently lack suitable GIS methods. The five developed methods enable modelling (i) visual exposure to tree canopy, (ii) individual tree visibility, (iii) tree crown light exposure and (iv) distance and (v) direction to nearest residential buildings. The method development was driven by the demands of specified application purposes. The first and the second methods were developed as flexible and efficient GRASS GIS tools usable for a broad range of research and practical applications. The remaining three methods were developed as alternatives to manual assessment for analyzing regulating ecosystem services from municipal trees in Oslo, Norway. The developed methods add to the emerging number of quantitative methods supporting ecosystem service quantification and assessment tailored to urban settings, and highlight the potential of GIS for high-resolution, large-scale ecosystem service assessment. Furthermore, the methods are transferable to modelling spatial context in other application areas to fulfil alternative purposes, including modelling the spatial context of structures other than trees. Finally, the thesis contributes to the literature by emphasizing the importance of spatial context for the delivery of ecosystem services from trees in urban landscapes specifically, and by improving our understanding of spatial context in ecosystem services of ecosystem assets in general.
... Until the last decade, hand-crafted methods such as SIFT [28], ORB [34], and BRISK [24] were widely used in several applications. However, applying them directly to panoramas tends to produce poor results due to strong distortions caused by planar mappings such as the ERP [41]. Cubemap projection involves six nearly planar projections with smaller FoVs, producing less distortions. ...
Conference Paper
Full-text available
Pose estimation is a crucial problem in several computer vision and robotics applications. For the two-view scenario, the typical pipeline consists of finding point correspondences between the two views and using them to estimate the pose. However, most available keypoint extraction and matching methods were designed to work with perspective images and may fail under not-affine distortions present in wide-angle or omnidirectional media, which are becoming increasingly popular in recent years. This paper presents a comprehensive comparative analysis of different keypoint matching algorithms for panoramas coupled to different linear and non-linear approaches for pose estimation. As an additional contribution, we explore a recent approach for mitigating spherical distortions using tangent plane projections , which can be coupled with any planar descriptor, and allows the adaptation of recent learning-based methods. We evaluate the combination of keypoint matching and pose estimation methods using the rotation and translation error of the estimated pose in different scenarios (indoor and outdoor), and our results indicate that SPHORB and "tangent SIFT" are competitive algorithms. We also show that tangent plane adaptations frequently present competitive results, and some optimization steps consistently improve the performance in all methods. We provide code at
... Photography-based methods to quantify green visual exposure W. Wang et al., 2019;Yang et al., 2009), here used to validate and assess the developed method, are based on the assumption that the amount of greenery measured from photographs reflects the amount of greenery observed by people. However, researchers disagree on the extent to which photographs really capture what people see (compare e.g., Aoki et al. (1985) and Falfán et al. (2018)), for instance due to observer characteristics (Falfán et al., 2018) or photograph distortion due to lens settings and panorama projections (Aoki et al., 1985;Zelnik-Manor et al., 2005). The values of visual exposure to urban greenery measured from photographs or modelled by r.viewshed.exposure, ...
Full-text available
Quantifying green visual exposure is necessary to assess aesthetic, social and health benefits from urban greenery. Viewshed analysis has been successfully used to model and map green visual exposure from human perspective in continuous representation and in places where street view imagery for widely-used photography-based methods is not available. However, current viewshed-based methods for modelling green visual exposure are often difficult to generalise beyond their specific application purpose, inefficient in processing large spatial extents and have limited use due to demands on technical knowledge. This hampers their wider use in research and practice. In this paper, we develop a viewshed analysis-based method for modelling visual exposure to urban greenery with special focus on the method’s applicability in research and practice. The method is implemented as a tool in GRASS GIS which makes it available as a practical and flexible tool. Extensive validation and assessment of the method on the specific case of urban trees confirm that the method is a highly accurate alternative to modelling visual exposure from street view imagery (ρ = 0.96) but that data quality and viewshed parametrisation are essential for achieving accurate results. Thanks to parallel processing and effective implementation, the method is applicable for city-wide scale analysis with high-resolution data on commodity hardware (here illustrated on the case of Oslo, Norway). Therewith, the method has potential application in many areas including strategic tree planting, scenario modelling and urban ecosystem accounting, as well as ecosystem service research.
... In 360°content streaming, viewport-based format is often preferred for coding and transmission [14], [15]. Other projection methods for image display [16], [17], [18] and visual recognition (e.g., icosahedron [19] and tangent images [20]) also emerge in the field of computer vision. With many possible projections at hand, it remains unclear which one is the best choice for learned 360°i mage compression in terms of rate-distortion performance, computation and implementation complexity, and compatibility with standard deep learning-based analysis/synthesis transform, and entropy model. ...
Although equirectangular projection (ERP) is a convenient form to store omnidirectional images (also known as 360-degree images), it is neither equal-area nor conformal, thus not friendly to subsequent visual communication. In the context of image compression, ERP will over-sample and deform things and stuff near the poles, making it difficult for perceptually optimal bit allocation. In conventional 360-degree image compression, techniques such as region-wise packing and tiled representation are introduced to alleviate the over-sampling problem, achieving limited success. In this paper, we make one of the first attempts to learn deep neural networks for omnidirectional image compression. We first describe parametric pseudocylindrical representation as a generalization of common pseudocylindrical map projections. A computationally tractable greedy method is presented to determine the (sub)-optimal configuration of the pseudocylindrical representation in terms of a novel proxy objective for rate-distortion performance. We then propose pseudocylindrical convolutions for 360-degree image compression. Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution with the so-called pseudocylindrical padding. To demonstrate the feasibility of our idea, we implement an end-to-end 360-degree image compression system, consisting of the learned pseudocylindrical representation, an analysis transform, a non-uniform quantizer, a synthesis transform, and an entropy model. Experimental results on $19,790$ omnidirectional images show that our method achieves consistently better rate-distortion performance than the competing methods. Moreover, the visual quality by our method is significantly improved for all images at all bitrates.
... Various projection formats are now available for 360 • images and videos. However, the transformation from a sphere into a 2D plane introduces several artifacts, such as sample redundancy, discontinuous boundaries, and shape distortion [13]. Redundancy in a sample causes many invalid pixels to be coded. ...
Full-text available
Currently available 360° cameras normally capture several images covering a scene in all directions around a shooting point. The captured images are spherical in nature and are mapped to a two-dimensional plane using various projection methods. Many projection formats have been proposed for 360° videos. However, standards for a quality assessment of 360° images are limited. In this paper, various projection formats are compared to explore the problem of distortion caused by a mapping operation, which has been a considerable challenge in recent approaches. The performances of various projection formats, including equi-rectangular, equal-area, cylindrical, cube-map, and their modified versions, are evaluated based on the conversion causing the least amount of distortion when the format is changed. The evaluation is conducted using sample images selected based on several attributes that determine the perceptual image quality. The evaluation results based on the objective quality metrics have proved that the hybrid equi-angular cube-map format is the most appropriate solution as a common format in 360° image services for where format conversions are frequently demanded. This study presents findings ranking these formats that are useful for identifying the best image format for a future standard.
We present ZoomShop, a photographic composition editing tool for adjusting relative size, position, and foreshortening of scene elements. Given an image and corresponding depth map as input, ZoomShop combines a novel non‐linear camera model and a depth‐aware image warp to reproject and deform the image. Users can isolate objects by selecting depth ranges and adjust their scale and foreshortening, which controls the paths of the camera rays through the scene. Users can also select 2D image regions and translate them, which determines the objective function in the image warp optimization. We demonstrate that ZoomShop can be used to achieve useful compositional goals, such as making a distant object more prominent while preserving foreground scenery, or making objects both larger and closer together so they still fit in the frame.
Spherical images or videos, as typical non-Euclidean data, are usually stored in the form of 2D panoramas obtained through an equirectangular projection, which is neither equal area nor conformal. The distortion caused by the projection limits the performance of vanilla Deep Neural Networks (DNNs) designed for traditional Euclidean data. In this paper, we design a novel Spherical Deep Neural Network (DNN) to deal with the distortion caused by the equirectangular projection. Specifically, we customize a set of components, including a spherical convolution, a spherical pooling, a spherical ConvLSTM cell and a spherical MSE loss, as the replacements of their counterparts in vanilla DNNs for spherical data. The core idea is to change the identical behavior of the conventional operations in vanilla DNNs across different feature patches so that they will be adjusted to the distortion caused by the variance of sampling rate among different feature patches. We demonstrate the effectiveness of our Spherical DNNs for saliency detection and gaze estimation in $360^\circ$ videos. To facilitate the study of the 360 video saliency detection, we further construct a large-scale $360^\circ$ video saliency detection dataset. Comprehensive experiments validate the effectiveness of our proposed Spherical DNNs for spherical handwritten digit classification and sport classification, saliency detection and gaze tracking in $360^\circ$ videos.
By recording the whole scene around the capturer, virtual reality (VR) techniques can provide viewers the sense of presence. To provide a satisfactory quality of experience, there should be at least 60 pixels per degree, so the resolution of panoramas should reach 21600 × 10800. The huge amount of data will put great demands on data processing and transmission. However, when exploring in the virtual environment, viewers only perceive the content in the current field of view (FOV). Therefore if we can predict the head and eye movements which are important behaviors of viewer, more processing resources can be allocated to the active FOV. But conventional saliency prediction methods are not fully adequate for panoramic images. In this paper, a new panorama-oriented model, to predict head and eye movements, is proposed. Due to the superiority of computation in the spherical domain, the spherical harmonics are employed to extract features at different frequency bands and orientations. Related low- and high-level features including the rare components in the frequency domain and color domain, the difference between center vision and peripheral vision, visual equilibrium, person and car detection, and equator bias are extracted to estimate the saliency. To predict head movements, visual mechanisms including visual uncertainty and equilibrium are incorporated, and the graphical model and functional representation for the switch of head orientation are established. Extensive experimental results on the publicly available database demonstrate the effectiveness of our methods.
Conference Paper
Full-text available
We present a new, interactive tool called Intelligent Scissors which we use for image segmentation and composition. Fully auto- mated segmentation is an unsolved problem, while manual tracing is inaccurate and laboriously unacceptable. However, Intelligent Scissors allow objects within digital images to be extracted quickly and accurately using simple gesture motions with a mouse. When the gestured mouse position comes in proximity to an object edge, a live-wire boundary "snaps" to, and wraps around the object of interest. Live-wire boundary detection formulates discrete dynamic pro- gramming (DP) as a two-dimensional graph searching problem. DP provides mathematically optimal boundaries while greatly reducing sensitivity to local noise or other intervening structures. Robustness is further enhanced with on-the-fly training which causes the boundary to adhere to the specific type of edge currently being followed, rather than simply the strongest edge in the neigh- borhood. Boundary cooling automatically freezes unchanging seg- ments and automates input of additional seed points. Cooling also allows the user to be much more free with the gesture path, thereby increasing the efficiency and finesse with which boundaries can be extracted. Extracted objects can be scaled, rotated, and composited using live-wire masks and spatial frequency equivalencing. Frequency equivalencing is performed by applying a Butterworth filter which matches the lowest frequency spectra to all other image compo- nents. Intelligent Scissors allow creation of convincing composi- tions from existing images while dramatically increasing the speed and precision with which objects can be extracted.
Full-text available
The problem of efficient, interactive foreground/background segmentation in still images is of great practical importance in image editing. Classical image segmentation tools use either texture (colour) information, e.g. Magic Wand, or edge (contrast) information, e.g. Intelligent Scissors. Recently, an approach based on optimization by graph-cut has been developed which successfully combines both types of information. In this paper we extend the graph-cut approach in three respects. First, we have developed a more powerful, iterative version of the optimisation. Secondly, the power of the iterative algorithm is used to simplify substantially the user interaction needed for a given quality of result. Thirdly, a robust algorithm for "border matting" has been developed to estimate simultaneously the alpha-matte around an object boundary and the colours of foreground pixels. We show that for moderately difficult examples the proposed method outperforms competitive tools.
Conference Paper
We describe mosaicing for a sequence of images acquiredby a camera rotating about its centre. The novel contributionsare in two areas. First, in the automation and estimation ofimage registration: images (60+) are registered under a full(8 degrees of freedom) homography; the registration is automaticand robust, and a maximum likelihood estimator is used.In particular the registration is consistent so that there are noaccumulated errors over a sequence. This means that it is nota problem ...
Conference Paper
The problem considered in this paper is the fully asutomaticconstruction of panoramas.Fundamentally, thisproblem requires recognition, as we need to know whichparts of the panorama join up.Previous approaches haveused human input or restrictions on the image sequencefor the matching step.In this work we use object recognitiontechniques based on invariant local features to selectmatchings images, and a probabilistic model for verification.Because of this our method is insensitive to the ordering, orientation, scale and illumination of the images.It is also insensitive to 'noise' images which are not partof the panorama at all, that is, it recognises panoramas.This suggests a useful application for photographers: thesystem takes as input the images on an entire flash card orfilm, recognises images that form part of a panorama, andstitches them with no user input whatsoever.