ArticlePDF Available

Creating a life-sized automultiscopic Morgan Spurlock for CNNs Inside Man



Content may be subject to copyright.
Creating a life-sized automultiscopic Morgan Spurlock for CNNs “Inside Man”
Andrew Jones
, Jonas Unger2, Koki Nagano, Jay Busch, Xueming Yu, Hsuan-Yueh Peng, Oleg Alexander, Paul Debevec
USC Institute for Creative Technologies 2Link¨
oping University
Figure 1: Three stereo photographs of Morgan Spurlock shown on the automultiscopic projector array. The display can be seen by multiple
viewers over a 135 field of view without the need for special glasses. The images are left-right reversed for cross-fused stereo viewing.
We present a system for capturing and rendering life-size 3D hu-
man subjects on an automultiscopic display. Automultiscopic 3D
displays allow a large number of viewers to experience 3D content
simultaneously without the hassle of special glasses or head gear.
Such displays are ideal for human subjects as they allow for natural
personal interactions with 3D cues such as eye-gaze and complex
hand gestures. In this talk, we will focus on a case-study where
our system was used to digitize television host Morgan Spurlock
for his documentary show ”Inside Man” on CNN. Automultiscopic
displays work by generating many simultaneous views with high-
angular density over a wide-field of view. The angular spacing be-
tween between views must be small enough that each eye perceives
a distinct and different view. As the user moves around the dis-
play, the eye smoothly transitions from one view to the next. We
generate multiple views using a dense horizontal array of video
projectors. As video projectors continue to shrink in size, power
consumption, and cost, it is now possible to closely stack hundreds
of projectors so that their lenses are almost continuous. However
this display presents a new challenge for content acquisition. It
would require hundreds of cameras to directly measure every pro-
jector ray. We achieve similar quality with a new view interpolation
algorithm suitable for dense automultiscopic displays.
Our interpolation algorithm builds on Einarsson et al. [2006] who
used optical flow to resample a sparse light field. While Einarsson
et al. was limited to cyclical motions using a rotating turntable, we
use an array of 30 unsynchronized Panasonic X900MK 60p con-
sumer cameras spaced over 180 degrees to capture unconstrained
motion. We first synchronize our videos within 1/120 of a sec-
ond by aligning their corresponding sound waveforms. We com-
pute pair-wise spatial flow correspondences between cameras using
GPU optical flow. As each camera pair is processed independently,
the pipeline can be highly parallelized. As a result, we achieve
much shorter processing times than traditional multi-camera stereo
reconstructions. Our view interpolation algorithm maps images di-
rectly from the original video sequences to all the projectors in real-
time, and could easily scale to handle additional cameras or projec-
tors. For the ”Inside Man” documentary we recorded a 54 minute
interview with Morgan Spurlock, and processed 7 minutes of 3D
video for the final show.
Our projector array consists of 216 video projectors mounted in a
semi-circle with a 3.4m radius. We have a narrow 0.625 spacing
between projectors which provides a large display depth of field
Figure 2: (left)Seven of the cameras used to capture the perfor-
mance. (right) The array of 216 video projectors used to display
the subject.
with minimal aliasing. We use LED-powered Qumi v3 projectors
in a portrait orientation (Fig. 2). At this distance the projected
pixels fill a 2m tall anisotropic screen with a life-size human body
(Fig. 1). The screen material consists of a vertically-anisotropic
light shaping diffuser manufactured by Luiminit Co. The material
scatters light vertically (60 ) so that each pixel can be seen at mul-
tiple viewing heights and while maintaining a narrow horizontal
blur (1) to smoothly fill in the gaps between the projectors with
adjacent pixels. More details on the screen material can be found
in Jones et al. [2014]. We use six computers to render the projector
images. Each computer contains two ATI Eyefinity 7800 graphics
cards with 12 total video outputs. Each video signal is then divided
three ways using a Matrox TripleHead-to-Go video HDMI splitter.
In the future, we plan on capturing longer format interviews and
other dynamic performances. We are working to incorporate natu-
ral language processing to allow for true interactive conversations
with realistic 3D humans.
EINA RS SO N, P., CHAB ERT, C.-F., JO NE S, A ., MA, W.-C., LA-
BE VE C, P. 2006. Relighting human locomotion with flowed
reflectance fields. In Rendering Techniques 2006: 17th Euro-
graphics Symposium on Rendering, 183–194.
JON ES , A., NAG AN O, K ., L IU, J., BUS CH , J., YU, X ., B OL AS,
M., AN D DEB EV EC , P. 2014. Interpolating vertical parallax for
an autostereoscopic 3d projector array.
... Bonding a spherical lenslet array [24], cylindrical lenticular array [22], or parallax barrier [15] onto a conventional highresolution 2D display is a popular approach. Another option is to combine multiple projectors using a reflective or trans missive screen that has a very narrow scattering profile [2,22,17]. Xia et al. [37] use light field generation to achieve a 360-degree surround viewable volumetric display with proper occlusion. Jones et al. [16] combine a fast spinning slanted anisotropic mirror with a synchronized projector to reproduce a light field that can be viewed from any angle. ...
Conference Paper
Full-text available
This paper describes a simple 3D display that can be built from a tablet computer and a plastic sheet folded into a cone. This display allows viewing a three-dimensional object from any direction over a 360-degree path of travel without the use of special glasses. Inspired by the classic Pepper's Ghost illusion, our approach uses a curved transparent surface to reflect the image displayed on a 2D display. By properly pre-distorting the displayed image our system can produce a perspective-correct image to the viewer that appears to be suspended inside the reflector. We use the gyroscope integrated into modern tablets to adjust the rendered image based on the relative orientation of the viewer. Our particular reflector geometry was determined by analyzing the optical performance and stereo-compatibility of a space of rotationally-symmetric conic surfaces. We present several prototypes along with side-by­ side comparisons with reference images.
... One of the most well known camera arrays for capturing light fields is the Stanford Multi-Camera Array [21], consisting of 128 video cameras that can be arranged in various layouts, such as a linear array of parallel cameras or a converging array of cameras having horizontal and/or vertical parallax. Numerous other multicamera setups have been built since then for both research and commercial purposes, e.g. the 100 camera system at Nagoya University [22], the 27-camera system at Holografika [23] (discussed later in this paper) or the 30-camera system from USC Institute for Creative Technologies [24]. These camera systems capture a sufficiently dense (in terms of angular resolution) and wide (in terms of baseline) light field, so that the captured data can be visualized on a light field display without synthesizing additional views beforehand. ...
Full-text available
Light field 3D displays represent a major step forward in visual realism, providing glasses-free spatial vision of real or virtual scenes. Applications that capture and process live imagery have to process data captured by potentially tens to hundreds of cameras and control tens to hundreds of projection engines making up the human perceivable 3D light field using a distributed processing system. The associated massive data processing is difficult to scale beyond a specific number and resolution of images, limited by the capabilities of the individual computing nodes. The authors therefore analyze the bottlenecks and data flow of the light field conversion process and identify possibilities to introduce better scalability. Based on this analysis they propose two different architectures for distributed light field processing. To avoid using uncompressed video data all along the processing chain, the authors also analyze how the operation of the proposed architectures can be supported by existing image/video codecs.
We propose a compact multi-projection system based on integral floating method with waveguide projection. Waveguide projection can reduce the projection distance by multiple folding of optical path inside the waveguide. The proposed system is composed of a wedge prism, which is used as a waveguide, multiple projection-units, and an anisotropic screen made of floating lens combined with a vertical diffuser. As the projected image propagates through the wedge prism, it is reflected at the surfaces of prism by total internal reflections, and the final view image is created by the floating lens at the viewpoints. The position of view point is decided by the lens equation, and the interval of view point is calculated by the magnification of collimating lens and interval of projection-units. We believe that the proposed method can be useful for implementing a large-scale autostereoscopic 3D system with high quality of 3D images using projection optics. In addition, the reduced volume of the system will alleviate the restriction of installment condition, and will widen the applications of a multi-projection 3D display.
Conference Paper
Full-text available
We present an image-based approach for capturing the appearance of a walking or running person so they can be rendered realistically under variable viewpoint and illumination. In our approach, a person walks on a treadmill at a regular rate as a turntable slowly rotates the person's direction. As this happens, the person is filmed with a vertical array of high-speed cameras under a time-multiplexed lighting basis, acquiring a seven-dimensional dataset of the person under variable time, illumination, and viewing direction in approximately forty seconds. We process this data into a flowed reflectance field using an optical flow algorithm to correspond pixels in neighboring camera views and time samples to each other, and we use image compression to reduce the size of this data. We then use image-based relighting and a hardware-accelerated combination of view morphing and light field rendering to render the subject under user-specified viewpoint and lighting conditions. To composite the person into a scene, we use an alpha channel derived from back lighting and a retroreflective treadmill surface and a visual hull process to render the shadows the person would cast onto the ground. We demonstrate realistic composites of several subjects into real and virtual environments using our technique.
We present a technique for achieving tracked vertical parallax for multiple users using a variety of autostereoscopic projector array setups, including front- and rear-projection and curved display surfaces. This hybrid parallax approach allows for immediate horizontal parallax as viewers move left and right and tracked parallax as they move up and down, allowing cues such as three-dimensional (3-D) perspective and eye contact to be conveyed faithfully. We use a low-cost RGB-depth sensor to simultaneously track multiple viewer head positions in 3-D space, and we interactively update the imagery sent to the array so that imagery directed to each viewer appears from a consistent and correct vertical perspective. Unlike previous work, we do not assume that the imagery sent to each projector in the array is rendered from a single vertical perspective. This lets us apply hybrid parallax to displays where a single projector forms parts of multiple viewers' imagery. Thus, each individual projected image is rendered with multiple centers of projection, and might show an object from above on the left and from below on the right. We demonstrate this technique using a dense horizontal array of pico-projectors aimed into an anisotropic vertical diffusion screen, yielding 1.5 deg angular resolution over 110 deg field of view. To create a seamless viewing experience for multiple viewers, we smoothly interpolate the set of viewer heights and distances on a per-vertex basis across the array's field of view, reducing image distortion, cross talk, and artifacts from tracking errors. (C) 2014 SPIE and IS&T