ThesisPDF Available

Multi-Camera Light Field Capture : Synchronization, Calibration, Depth Uncertainty, and System Design

Authors:

Abstract and Figures

The digital camera is the technological counterpart to the human eye, enabling the observation and recording of events in the natural world. Since modern life increasingly depends on digital systems, cameras and especially multiple-camera systems are being widely used in applications that affect our society, ranging from multimedia production and surveillance to self-driving robot localization. The rising interest in multi-camera systems is mirrored by the rising activity in Light Field research, where multi-camera systems are used to capture Light Fields - the angular and spatial information about light rays within a 3D space. The purpose of this work is to gain a more comprehensive understanding of how cameras collaborate and produce consistent data as a multi-camera system, and to build a multi-camera Light Field evaluation system. This work addresses three problems related to the process of multi-camera capture: first, whether multi-camera calibration methods can reliably estimate the true camera parameters; second, what are the consequences of synchronization errors in a multi-camera system; and third, how to ensure data consistency in a multi-camera system that records data with synchronization errors. Furthermore, this work addresses the problem of designing a flexible multi-camera system that can serve as a Light Field capture testbed. The first problem is solved by conducting a comparative assessment of widely available multi-camera calibration methods. A special dataset is recorded, giving known constraints on camera ground-truth parameters to use as reference for calibration estimates. The second problem is addressed by introducing a depth uncertainty model that links the pinhole camera model and synchronization error to the geometric error in the 3D projections of recorded data. The third problem is solved for the color-and-depth multi-camera scenario, by using a proposed estimation of the depth camera synchronization error and correction of the recorded depth maps via tensor-based interpolation. The problem of designing a Light Field capture testbed is addressed empirically, by constructing and presenting a multi-camera system based on off-the-shelf hardware and a modular software framework. The calibration assessment reveals that target-based and certain target-less calibration methods are relatively similar at estimating the true camera parameters. The results imply that for general-purpose multi-camera systems, target-less calibration is an acceptable choice. For high-accuracy scenarios, even commonly used target-based calibration approaches are insufficiently accurate. The proposed depth uncertainty model is used to show that converged multi-camera arrays are less sensitive to synchronization errors. The mean depth uncertainty of a camera system correlates to the rendered result in depth-based reprojection, as long as the camera calibration matrices are accurate. The proposed depthmap synchronization method is used to produce a consistent, synchronized color-and-depth dataset for unsynchronized recordings without altering the depthmap properties. Therefore, the method serves as a compatibility layer between unsynchronized multi-camera systems and applications that require synchronized color-and-depth data. Finally, the presented multi-camera system demonstrates a flexible, de-centralized framework where data processing is possible in the camera, in the cloud, and on the data consumer's side. The multi-camera system is able to act as a Light Field capture testbed and as a component in Light Field communication systems, because of the general-purpose computing and network connectivity support for each sensor, small sensor size, flexible mounts, hardware and software synchronization, and a segmented software framework.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avoid blind spots which can otherwise lead to accidents. To minimize the number of cameras needed for surround perception, we utilize fisheye cameras. Consequently, standard vision pipelines for 3D mapping, visual localization, obstacle detection, etc. need to be adapted to take full advantage of the availability of multiple cameras rather than treat each camera individually. In addition, processing of fisheye images has to be supported. In this paper, we describe the camera calibration and subsequent processing pipeline for multi-fisheye-camera systems developed as part of the V-Charge project. This project seeks to enable automated valet parking for self-driving cars. Our pipeline is able to precisely calibrate multi-camera systems, build sparse 3D maps for visual navigation, visually localize the car with respect to these maps, generate accurate dense maps, as well as detect obstacles based on real-time depth map extraction.
Conference Paper
Full-text available
Surveillance videos are becoming ubiquitous for monitoring and ensuring security. Nevertheless, mentally fusing the data from multiple video streams covering different regions comes with a high cognitive burden. In this paper we introduce, Video Fields, a novel web-based interactive system to create, calibrate, and render dynamic video-based virtual reality scenes in head-mounted displays, as well as high-resolution wide-field-of-view tiled display walls. Video Fields system automatically projects dynamic videos onto user-defined geometries. It allows users to adjust camera parameters, navigate through time, walk around the scene, and see through the buildings. Our system integrates background modeling and automatic segmentation of moving entities with rendering of video fields. We present two methods to render video fields: early pruning and deferred pruning. Experimental results indicate that the early pruning approach is more efficient than the deferred pruning. Nevertheless, the deferred pruning achieves better results through anti-aliasing and bi-linear interpolation. We envision the use of the system and algorithms introduced in Video Fields for immersive surveillance monitoring in virtual environments.
Conference Paper
Full-text available
SocialSync is a sub-frame synchronization protocol for capturing images simultaneously using a smartphone camera network. By synchronizing image captures to within a frame period, multiple smartphone cameras, which are often in use in social settings, can be used for a variety of applications including light field capture, depth estimation, and free viewpoint television. Currently, smartphone camera networks are limited to capturing static scenes due to motion artifacts caused by frame misalignment. Because frame misalignment in smartphones camera networks is caused by variability in the camera system, we characterize frame capture on mobile devices by analyzing the statistics of camera setup latency and frame delivery within an Android app. Next, we develop the SocialSync protocol to achieve sub-frame synchronization between devices by estimating frame capture timestamps to within millisecond accuracy. Finally, we demonstrate the effectiveness of SocialSync on mobile devices by reducing motion-induced artifacts when recovering the light field.
Article
We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.
Conference Paper
We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and we describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS[21]. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM, however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.
Chapter
Because data from depth cameras are usually noisier and possess lower resolution than data from standard cameras, combining them leads to more accurate 3D representations than those provided by depth cameras alone. What’s more, the higher resolution of standard cameras can be used to obtain higher resolution depth maps. The first part of this chapter is devoted to combining a depth camera with a single color camera, with the goal of improving depth quality and resolution by filtering, interpolation, or enhancement techniques guided by the color data. The second part considers the case of a stereo vision system assisting a depth camera. In this case, two independent depth sources are combined to deliver a unique output stream with improved characteristics compared to the original inputs. Various approaches for this task are presented, based on both fast local schemes suited to real-time operation and more refined global optimization procedures.