Figure 7 - uploaded by Xiang Chen
Content may be subject to copyright.
Source publication
A comprehensive, intuitive, task-oriented three-dimensional coverage model for cameras and multi-camera networks using fuzzy sets is presented. The model captures the vagueness inherent in the concept of visual coverage. At present, the model can be used to evaluate, given a scene model and an objective, the coverage performance of a given camera o...
Context in source publication
Context 1
... the form A ̇ denotes a discrete fuzzy set sampled from a continuous fuzzy set A (i.e. μ A ̇ ( x ) = μ A ( x ) for any x ∈ A ̇ ), and | A ̇ | = P x ∈ S A ̇ μ A ̇ ( x ) is the scalar cardinality of A ̇ . In (17), S C ̇ N = S D ̇ ; that is, C ̇ N and D ̇ are sampled on the same discrete “grid” of points in D . The fuzzy coverage model and fuzzy vision graph have been implemented in an object-oriented software stack using Python. First, we have developed FuzzPy, a generic open-source Python library for fuzzy sets and graphs. Us- ing this functionality, we have developed various classes for the fuzzy sets used in the model. The Camera class, ini- tialized with the 14 model parameters, returns the μ -value of any spatial point in camera coordinates using continuous trapezoidal fuzzy sets to implement C . The MultiCam- eraSimple and MultiCamera3D classes build their respective discrete fuzzy sets from Camera objects and a supplied scene model S . Coverage performance m can be estimated given a discrete fuzzy set D ̇ . Prosilica EC-1350 cameras, with a sensor resolution of 1360 × 1024 pixels and square pixel size of 4 . 65 μ m, are employed. These are fitted with Computar M3Z1228C-MP manual varifocal lenses, with a focal length range of 12 mm to 36 mm and a maximum aperture ratio of 1:2.8. The intrinsic and extrinsic camera parameters are found using HALCON. We demonstrate that the fuzzy coverage model C , for various cameras and vision tasks, yields a useful and accurate metric for coverage. Quantitative validation would require a representative cross-section of possible single camera tasks – for example, various feature detection, object recognition, tracking, and motion analysis algorithms – to be tested for performance, and compared to predictions from a model C , built from application parameters based entirely on task requirements (see Section 7.1 for further discussion). This is beyond our scope at this stage; here, we demonstrate simple qualitative relationships between the fuzzy coverage model and the appearance of features in the associated image, showing that the model appears to reflect reality at least reasonably well. Figure 5 shows the calibration target, with the axes of the world coordinate system. Based on the intrinsic and extrinsic parameters obtained from calibration, a fuzzy coverage model for this single camera is generated. Figure 6 shows a visualization of the model, with p ∈ D sampled in incre- ments of 50 mm and π/ 2. The opacity of the directional points reflects the value of μ C N ( p ). The key points shown in Figure 5 are shown as blue points in Figure 6. Figure 7 shows a typical set of images from our qualitative single-camera model validation experiments. We use the legibility of text on the face of a ruler, as judged by a human, as a qualitative measure of imaging quality. This might be considered analogous to automated feature detection or character recognition. In this case, a specific point on the face of the ruler is positioned at three distinct spatial positions (each row shows one position), and the face is ori- ented to three distinct angles (each column shows one angle). Table 1 shows the μ C s membership values given by the in- scene model (with parameters γ = 20, R 1 = 3 . 0, R 2 = 0 . 5, c max = 0 . 048, and ζ = 1 . 2 set in advance), and compares them to the average of 7 human respondents’ estimations of legibility from the same images normalized from an integer scale of 0 - 5. We demonstrate that the relationships between coverage performance values of various network configurations are sound and useful. As in the single-camera experiments, we take a qualitative approach at present, judging the perceived coverage in images from a variety of configurations and sce- narios. Figure 8 shows a typical set of images from our qualitative multi-camera model validation experiments. In this case, the non-directional coverage of eight small objects, placed at points p 1 through p 8 around the periphery of an occluding box, is tested for various combinations of up to six cameras. Table 2 shows the simple multi-camera network results, using γ = 20, R 1 = 3 . 0, R 2 = 0 . 5, c max = 0 . 0048, and neglecting direction (no C D component). D is specified as having μ D = 1 in a region extending 200 mm outward from the box in x and y and the full height of the box in z (and 0 otherwise). The numbered networks 1 - 6 are the individual cameras, A is the network of all 6 cameras, and the remainder are defined as F = { 1 , 2 } , G = { 3 , 4 , 5 } , and H = { 5 , 6 } . Some properties of the model are immediately clear from these results. First, the coverage degrees of p 1 through p 8 in A , F , G , and H are limited to the maximum of that of their constituent cameras. Second, in all cases where N i ⊆ N j for two networks N i and N j , m ( N i , D ) ≤ m ( N j , D ). One can also see clearly that despite having 3 cameras, G performs more poorly than the 2-camera networks F and H ; this is consistent with the m values computed for the individual cameras from these networks. In general, the model gives a clear, intuitive, and accurate indication of coverage performance for the specified region. In order to accurately evaluate coverage for a given task, the application parameters need to be chosen appropriately. An inherent advantage of models based on fuzzy sets is that such parameters need not be particularly precise, since the relative structure of results tends to be quite robust [11]. Though the importance and effects of the parameters will vary depending on the task, possible considerations ...
Similar publications
Skin color reproduction becomes increasingly important with the recent progress in various imaging systems. In this paper, based on subjective experiments, correlation maps are analyzed between appearance of Japanese facial images and amount of melanin and hemoglobin components in the facial skin. Facial color images were taken by digital still cam...
This paper deals with pose estimation using an iterative scheme. We show that using adequate visual information, pose estimation can be performed iteratively with only three independent unknowns, which are the translation parameters. Specifically, an invariant to rotational motion is used to estimate the camera position. In addition, an adequate tr...
This paper builds on existing work on learning pro-tocol behaviour from observation to propose a new framework for visual attention. The main contri-bution of this work resides in the fact that atten-tion is not given a priori to the vision system but learned by induction from the active observation of patterns in space. These patterns are sequence...
We propose in this work to synthesize online the
visual content of a complex video into multiple sprites, while
detecting on-the-fly each sprite’s limits. For each received
frame, physically meaningful camera rotation angles and
focal lengths are firstly estimated from frame-to-sprite
homography. These physical parameters are then used by
a thresho...
As an important geometric structure information in image processing, line feature is significant in visual navigation, three-dimensional objects’ structure contour extraction, etc. Current excellent line extraction methods have fast extracting speed and good extracting results. However, most of them have a problem: a long line is easily fragmented....
Citations
... In Pascal Straub's thesis [130], we found that the most influential parameter for this function is the ratio of pixels in the view to be reconstructed, which are also visible in the remaining encoded views. Following Zhang et al. [13], who used that coverage to optimize the distribution of the movable cameras in their camera array for maximum image quality, we used the same definition from Mavrinac et al. [131]. They define the covered points in multi-camera systems C, as the points that are included in four sets: C V is the set of fully or partially visible points, C R denotes the points covered by a sufficient number of pixels, C F includes all points in focus and C D contains visible points based on the direction of the camera relative to the scene. ...
Multi-camera footage contains much more data in comparison to that of conventional video. While the additional data enables a number of new effects that previously required a large amount of CGI magic and manual labor to achieve, it can easily overpower consumer hardware and networks. In this thesis, we explore the necessary steps to create an interactive multiview streaming system, from the cameras via the compression and streaming of the material, to the view interpolation to create immersive perspective shifts for viewers. By only using freely available consumer hardware, and making sure all steps can run in real-time, in combination with the others, the benefits of multi-camera video are made available to a wider public. With the construction of a modular camera array for lightfield recording, we highlight the most important properties of such an array to allow for good post-processing of the recorded data. This includes a flexible yet sturdy frame, the management of computational and storage resources, as well as the required steps to make the raw material ready for further processing. The Unfolding scene displays the possibilities of lightfields when a good camera array is combined with the talent of professional visual effect artists for the creation of future cinematic movies. Furthermore, we explore the benefits of 5D-lightfield video for scenes with fast motion, using precisely controlled time delays between the shutters of different cameras in the capturing array.
... All the pixels that cannot be filled from other views, have to be filled using image inpainting techniques and are therefore very prone to errors. The coverage is modeled using a simplified version of the fuzzy sets described by Mavrinac et al. in [7], omitting the focus and resolution influences as we expect the cameras to have the same resolution and focus settings. This way the scene composition as well as the differences in camera position and viewing angle are taken into account. ...
In this paper an approach to optimize the transmission of multiview video over bandwidth limited channels is presented. The main idea is to reduce the number of transmitted views by choosing a set of views that can be reconstructed with the highest quality at the receiver using view interpolation, removing them from the stream and encoding the remaining views with a higher quality. We have shown that at the moment this approach only makes sense in cases where the channel is very restricted, but when view interpolation algorithms improve in quality, the threshold at which our method improves image quality rises quickly and can be applied to a wider range of streaming scenarios.
... It is desirable to eventually relax all three of these restrictions in further study on the topic. We also consider direct validation of the coverage model outside our scope, and direct the reader to our earlier validation work [Mavrinac et al. 2010a[Mavrinac et al. , 2010b[Mavrinac et al. , 2011. ...
The problem of online selection of monocular view sequences for an arbitrary task in a calibrated multi-camera network is investigated. An objective function for the quality of a view sequence is derived from a novel task-oriented, model-based instantaneous coverage quality criterion and a criterion of the smoothness of view transitions over time. The former is quantified by a priori information about the camera system, environment, and task generally available in the target application class. The latter is derived from qualitative definitions of undesirable transition effects. A scalable online algorithm with robust suboptimal performance is presented based on this objective function. Experimental results demonstrate the performance of the method—and therefore the criteria—as well as its robustness to several identified sources of nonsmoothness.
... In this section an abridged description of the coverage model is provided. This model was developed in our previous work in Mavrinac et al. (2010). Validation of the model is provided in Alarcon-Herrera et al. (2011). ...
A method for PTZ camera re-conguration oriented toward tracking applications and surveillance systems is presented. The visual constraints are transformed into geometric constraints by a coverage model, and the nal PTZ congurations are averaged by a consensus algorithm. The approach is to design a distributed algorithm that enables cooperation between the cameras. Experimental results show successful camera handoff.
... A substantial component of our contribution is the taskoriented model of multi-camera coverage presented in Section II, which builds upon our previous work in this area. Direct validation of the coverage model is outside the scope of this work; we direct the reader to our validation of earlier formulations of the model [12], [13], [14], and intend to publish further results focusing exclusively on validation for a diversity of tasks. The problem statement, the view selection algorithm itself, and implementation remarks are presented in Section III. ...
A method for real-time selection of optimal view sequences for a task in a calibrated multi-camera system is presented. Selection is based on a multi-camera coverage model constructed from a priori information about the cameras, task, and scene, which are assumed to be available in the relatively controlled environments of the target application class. Experimental results demonstrate the generality and effectiveness of the algorithm and explore trade-offs in parameterization.
... The scale and performance of most tasks in multi-camera networks (indeed, in sensor networks generally) are directly related to the volume of coverage of the sensor(s) in question. In previous work [4], we developed a real-valued coverage model for multi-camera systems, inspired by task-oriented sensor planning models from the computer vision literature [5] and by coverage models used for various purposes in wireless sensor networks [6], [7]. We demonstrate that, given a set of a priori parameters of the multi-camera system and some task requirements, this model accurately describes the true coverage of a scene in the context of the task. ...
... First, we present a single-camera parameterization of the coverage strength model, for which the full theoretical derivation can be found in [4]. ...
... A set of single-camera models may be placed in the context of a world coordinate frame and a scene, and then combined into multi-camera coverage models. Again, theoretical details may be found in [4]. ...
A new topological model of camera network coverage, based on a weighted hypergraph representation, is introduced. The model's theoretical basis is the coverage strength model, presented in previous work and summarized here. Optimal distribution of task processing is approximated by adapting a local search heuristic for parallel machine scheduling to this hypergraph model. Simulation results are presented to demonstrate its effectiveness.
... no external occlusion is allowed, etc.). Our previously developed coverage model [2], [1] is well suited to a generate-and-test approach. Our coverage metric has been shown to closely reflect the task's a posteriori performance [2], [3], [1]. ...
... Our previously developed coverage model [2], [1] is well suited to a generate-and-test approach. Our coverage metric has been shown to closely reflect the task's a posteriori performance [2], [3], [1]. Currently there exists no feasible technique for numerical optimization using this model; in this paper we employ the generate-and-test approach to perform sensor planning. ...
... In previous work, Mavrinac et al. [2], [3], [1] developed a coverage strength model which includes most of the camera's characteristics and properties; among these are the extrinsic and intrinsic parameters as well as the optical properties of the lens, the camera's sensor and several intuitive task parameters which will be described in this section. ...
A method for sensor planning based on a previously developed coverage strength model is presented. The approach taken is known as generate-and-test: a feasible solution is predefined and then tested using the coverage model. The relationship between the resolution of the imaging system and its performance is the key component to perform sensor planning of range cameras. Experimental results are presented; the inverse correlation between coverage performance and measurement error demonstrates the usefulness of the model in the sensor planning context.
... This require a function to "gracefully" decade from full visible status to invisible. A fuzzy model has been introduced in [82], which can be easily incorporated into a binary integer programming model. All we ...
Wide-area camera networks are becoming more and more common. They have widerange of commercial and military applications from video surveillance to smart home and from traffic monitoring to anti-terrorism. The design of such a camera network is a challenging problem due to the complexity of the environment, self and mutual occlusion of moving objects, diverse sensor properties and a myriad of performance metrics for different applications. In this dissertation, we consider two such challenges: camera planing and camera fusion. Camera planning is to determine the optimal number and placement of cameras for a target cost function. Camera fusion describes the task of combining images collected by heterogenous cameras in the network to extract information pertinent to a target application.
I tackle the camera planning problem by developing a new unified framework based on binary integer programming (BIP) to relate the network design parameters and the performance goals of a variety of camera network tasks. Most of the BIP formulations are NP hard problems and various approximate algorithms have been proposed in the literature. In this dissertation, I develop a comprehensive framework in comparing the entire spectrum of approximation algorithms from Greedy, Markov Chain Monte Carlo (MCMC) to various relaxation techniques. The key contribution is to provide not only a generic formulation of the camera planning problem but also novel approaches to adapt the formulation to powerful approximation schemes including Simulated Annealing (SA) and Semi-Definite Program (SDP). The accuracy, efficiency and scalability of each technique are analyzed and compared in depth. Extensive experimental results are provided to illustrate the strength and weakness of each method.
The second problem of heterogeneous camera fusion is a very complex problem. Information can be fused at different levels from pixel or voxel to semantic objects, with large variation in accuracy, communication and computation costs. My focus is on the geometric transformation of shapes between objects observed at different camera planes. This so-called the geometric fusion approach usually provides the most reliable fusion approach at the expense of high computation and communication costs. To tackle the complexity, a hierarchy of camera models with different levels of complexity was proposed to balance the effectiveness and efficiency of the camera network operation. Then different calibration and registration methods are proposed for each camera model. At last, I provide two specific examples to demonstrate the effectiveness of the model: 1)a fusion system to improve the segmentation of human body in a camera network consisted of thermal and regular visible light cameras and 2) a view dependent rendering system by combining the information from depth and regular cameras to collecting the scene information and generating new views in real time.
While the theoretical foundation of the optimal camera placement problem has been studied for decades, its practical implementation has recently attracted significant research interest due to the increasing popularity of visual sensor networks. The most flexible formulation of finding the optimal camera placement is based on a binary integer programming (BIP) problem. Despite the flexibility, most of the resulting BIP problems are NP-hard and any such formulations of reasonable size are not amenable to exact solutions. There exists a myriad of approximate algorithms for BIP problems, but their applications, efficiency, and scalability in solving camera placement are poorly understood. Thus, we develop a comprehensive framework in comparing the merits of a wide variety of approximate algorithms in solving the optimal camera placement problems. We first present a general approach of adapting these problems into BIP formulations. Then, we demonstrate how they can be solved using different approximate algorithms including greedy heuristics, Markov-chain Monte Carlo, simulated annealing, and linear and semidefinite programming relaxations. The accuracy, efficiency, and scalability of each technique are analyzed and compared in depth. Extensive experimental results are provided to illustrate the strength and weakness of each method.
An automatic method for solving the problem of view planning in high-resolution industrial inspection is presented. The method's goal is to maximize the visual coverage, and to minimize the number of cameras used for inspection. Using a CAD model of the object of interest, we define the scene-points and the viewpoints, with the later being the solution space. The problem formulation accurately encapsulates all the vision-and task-related requirements of the design process for inspection systems. We use a graph-based approach to formulate a solution for the problem. The solution is implemented as a greedy algorithm, and the method is validated through experiments.