Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes a hardware system and the underlying algorithms that were developed for realtime stereoscopic videoconferencing with viewpoint adaptation within the European PANORAMA project. The goal was to achieve a true telepresence illusion for the remote partners. For this purpose, intermediate views at arbitrary positions must be synthesized from the views of a stereoscopic camera system with rather large baseline. The actual viewpoint is adapted according to the head position of the viewer, such that the impression of motion parallax is produced. The whole system consists of a disparity estimator, stereoscopic MPEG-2 encoder, disparity encoder and multiplexer at the transmitter side, and a demultiplexer, disparity decoder, MPEG-2 decoder and interpolator with viewpoint adaptation at the receiver side. For transmission of the encoded signals, an ATM network is provided. In the final system, autostereoscopic displays will be used. The algorithms for disparity estimation, disparity encoding and disparity-driven intermediate viewpoint synthesis were specifically developed under the constraint of hardware feasibility.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this case, corresponding points in two views, which refer to a common 3D point, are lying on the same horizontal scan line. Due to these simplification it was possible to develop very fast algorithms for real-time applications, like the mobile robotic or interactive multimedia services [1] [2] [3]. In future applications using 3D image analysis, however, parallel camera configurations will be neither practical nor possible. ...
... A hierarchical block-matching approach, providing accurate localisation of the disparities combined with high disparity resolution [3], has been used as baseline algorithm for the comparative study in this paper. The MAD (mean of absolute differences) has been used as a similarity measure. ...
... Tab. 2 and Tab. 3 show a comparison of the computational load in terms of average numbers of multiplications and additions per pixel. These figures only take into account those operations which are relevant for online processing. ...
Conference Paper
Full-text available
In this paper, we present results of a comparative study on disparity analysis of convergent stereo systems. If the epipolar geometry is known, disparity analysis can be performed by computing the disparity along the epipolar line in the original views. Using a parallel stereo algorithm, the convergent views have to be rectified in order to have horizontal and parallel epipolar lines. In this study, both approaches are investigated with respect to the computational effort and the quality of the disparity analysis results.
... disparity stream: a left-to-right (L→R) vector field and a right-to-left (R→L) vector field. The disparity estimator implements a hierarchical block-matching scheme described in [2]. In order to reduce its complexity, the disparity estimator was optimized for head and shoulder scenes with a uniform background. ...
... In order to reduce its complexity, the disparity estimator was optimized for head and shoulder scenes with a uniform background. Both vector fields contain approximately the same information and are highly redundant, thus the disparity vectors are transformed to chain maps [2], [3] and compressed by the disparity encoder. When teleconferencing scenes are involved, vectors from both vector maps are used according to their position. ...
... When teleconferencing scenes are involved, vectors from both vector maps are used according to their position. Simulations showed that R→L disparities point more reliably in left occlusions (parts of the scene present only in the left-view image), while R→L disparities produce better image quality in right occlusions (parts of the scene present only in the right-view image) [2]. Therefore vectors from the right-to-left map are used for the left part of the image, and vice versa. ...
Article
This letter presents a real-time lossless compression/decompression unit for disparity map information used in 3D teleconferencing systems. A lossless compression algorithm is used to compress the disparity map data in real-time, resulting in a variable bit-rate data stream that has to be transmitted through a constant bit-rate channel. The system uses a controlled-data-loss method for data rate adaptation and for minimizing the loss of information.
... An interpolation step at the end of the processing chain produces 1/8 sub-pixel resolution. In the application field of video conferencing a real-time system was developed by the European ACTS project PANORAMA in 1997 [18]. Dense disparity maps are calculated for images of size 720x576 pixel using a hierarchical block matching approach. ...
... This matching criterion is a shape adaptive version of the sum of absolute differences (SAD), which is used by many real-time algorithms [13][14][17][18][25]. Other matching criteria known from literature, like normalized cross correlation (NCC), usually provide better results, but they are computationally more expensive than SAD. ...
... In a lot of applications a simple horizontal interpolation is sufficient to close the holes in post-processed disparity maps [18]. However, simple interpolation only works in regions with homogeneous depth, whereas it can produce crucial artefacts at depth discontinuities as it can for example be seen in the rightto-left disparity map inFig. ...
Article
Full-text available
Real-time stereo analysis is an important research area in computer vision. In this context, we propose a stereo algorithm for an immersive video-conferencing system by which conferees at different geographical places can meet under similar conditions as in the real world. For this purpose, virtual views of the remote conferees are generated and adapted to the current viewpoint of the local participant. Dense vector fields of high accuracy are required in order to guarantee an adequate quality of the virtual views. Due to the usage of a wide baseline system with strongly convergent camera configurations, the dynamic disparity range is about 150 pixels. Considering computational costs, a full search or even a local search restricted to a small window of a few pixels, as it is implemented in many real-time algorithms, is not suitable for our application because processing on full-resolution video according to CCIR 601 TV standard with 25 frames per second is addressed-the most desirable as a pure software solution running on available processors without any support from dedicated hardware. Therefore, we propose in this paper a new fast algorithm for stereo analysis, which circumvents the window search by using a hybrid recursive matching strategy based on the effective selection of a small number of candidates. However, stereo analysis requires more than a straightforward application of stereo matching. The crucial problem is to produce accurate stereo correspondences in all parts of the image. Especially, errors in occluded regions and homogenous or less structured regions lead to disturbing artifacts in the synthesized virtual views. To cope with this problem, mismatches have to be detected and substituted by a sophisticated interpolation and extrapolation scheme.
... In contrast to [Ohm+98], a multiprocessor board that can be attached to standard PC hardware is used for disparity estimation. Additionally, the 3D processing of input from four cameras is supported in order to allow for a greater perspective change while focusing on a scalable multi-party scenario within a shared virtual table environment. ...
... Cox et al. [Cox+92]. Other authors propose the application of hierarchical block-matching schemes implemented in hardware solutions [Ohm+98] or even a global optimization of the disparity map in terms of post-processing the results of an adaptive window-based matching approach [Tsa+04]. [EE13,GSF14]. ...
Thesis
Das Problem des fehlenden Augenkontaktes vermindert den Eindruck einer natürlichen Kommunikationssituation bei Videokonferenzen. Während eine Person auf den Bildschirm blickt, wird sie von Kameras aufgenommen, die sich normalerweise direkt daneben befinden. Mit dem Aufkommen von massiv paralleler Computer Hardware und ganz speziell den sehr leistungsstarken Spielegrafikkarten ist es möglich geworden, viele Eingabeansichten für eine Echtzeit 3D Rekonstruktion zu verarbeiten. Eine größere Anzahl von Eingabeansichten mildert Verdeckungsprobleme ab und führt zu vollständigeren 3D Daten. In dieser Arbeit werden neue Algorithmen vorgeschlagen, welche eine hochqualitative Echtzeit 3D Rekonstruktion, die kontinuierliche Anpassung der photometrischen Kameraparameter und die benutzerunabhängige Schätzung der Augenkontaktkameras ermöglichen. Die Echtzeit 3D Analyse besteht aus zwei komplementären Ansätzen. Einerseits gibt es einen Algorithmus, der auf der Verarbeitung geometrischer Formen basiert und auf der anderen Seite steht eine patchbasierte Technik, die 3D Hypothesen durch das Vergleichen von Bildtexturen evaluiert. Zur Vorbereitung für die Bildsynthese ist es notwendig, Texturen von verschiedenen Ansichten anzugleichen. Hierfür wird die Anwendung eines neuen Algorithmus zur kontinuierlichen photometrischen Justierung der Kameraparameter vorgeschlagen. Die photometrische Anpassung wird iterativ, im Wechsel mit einer 3D Registrierung der entsprechenden Ansichten, ausgeführt. So ist die Qualität der photometrischen Parameter direkt mit jener der Ergebnisse der 3D Analyse verbunden und vice versa. Eine weitere wichtige Voraussetzung für eine korrekte Synthese der Augenkontaktansicht ist die Schätzung einer passenden virtuellen Augenkontaktkamera. Hierfür wird die Augenkontaktkamera kontinuierlich an die Augenposition der Benutzer angeglichen. Auf diese Weise wird eine virtuelle Kommunikationsumgebung geschaffen, die eine natürlichere Kommunikation ermöglicht.
... Several types of 3D display concepts have been developed over the years, ranging from the well-known glasses-based displays, to headtracked [1], volumetric [2], and e.g. multiview [3] displays. ...
... We aim at general, natural scenes as occurring in two applications: 3DTV recordings and medical operating room " black box " recordings [14]. For 3DTV, we require the ability to synthesize several viewpoints as needed by multi-viewpoint displays [3], with an image quality equal to current 2D broadcasts. For the medical application, we additionally require the ability to synthesize free viewpoints over a considerable viewing angle. ...
Article
Full-text available
This paper describes the motivation, goal and methods of a project on 3D video acquisition. The project runs in 2006-2009, and combines the expertise of Philips, active in consumer 3DTV applications, and Delft University of Technology, active in general image processing and 3D multi-camera processing. We focus on 3DTV and medical applications, for which we have contacts with Dutchview, a major Dutch TV production company, and LUMC, the Leiden University Medical Centre.
... DISTIMA has developed a system for capturing, coding, transmitting and presenting digital stereoscopic image sequences. Later, another project called PANORAMA was developed to enhance the visual information exchange in telecommunications with 3-D telepresence [2]. The two major achievements of these systems are the auto-stereoscopic display and the multi-viewpoint capability. ...
... There are two main purposes for existing stereoscopic video encoders. One is to interpolate intermediate images at the receiver so as to provide viewpoint adaptation [1][2] and the other is to provide the viewers the 3D feeling [5][6]. For the former case, high qualities are required for both image sequences and the disparity fields, thus the computational complexity for obtaining the disparity field and the bitrate are usually very high. ...
Conference Paper
In this paper, we propose an object-based MPEG-4 compatible stereoscopic video sequence encoder. We aim at efficient stereoscopic video compression for videoconferencing and 3D telepresence systems. The stereoscopic video sequence includes one main view and one auxiliary view. The main view is encoded using an MPEG-4 encoder and the auxiliary view is encoded by joint motion and disparity compensation. After the input sequences are balanced to compensate for lighting conditions and camera differences, the joint disparity and motion regularization is performed on a VOP basis. The output of the encoder contains two bitstreams, a main bitstream, which can be decoded by a standard MPEG-4 decoder, and an auxiliary bitstream. Simulation results show that the joint estimation and regularization of disparity and motion fields provide more accurate vector fields and efficient compression for the auxiliary stream. The proposed system achieves high image quality at lower bitrate than existing stereoscopic video encoders.
... (2) the max minimum Angle property: for any of the two adjacent triangles of quadrilateral, Delaunay triangulation for the quadrilateral a diagonal line is divided into the two lowest value of all six angles in the triangle will be bigger than the other two triangle which is a diagonal line the lowest value of all six angles [3] . ...
... Schreer, Brandburg et Kauff présentent dans [61] uneétude comparative entre l'estimation des disparités sur des vues rectifiées et non rectifiées. Dans les deux cas, la même approche hiérarchique de block-matching, décrite dans [52], est appliquée. La première comparaison de cetteétude concerne la complexité. ...
Article
This thesis deals with camera motion estimation, when the camera films a static scene, from the obtained sequence of images. The proposed method concerns motion estimation between two adjacent frames and is based on the determination of a 2D quadratic deformation between images. From the motion estimation, we next study the problem of scene structure estimation. We apply Belief Propagation method directly on an images couple, without any rectification, just using motion estimation. Finally, we study the injectivity of the map that associates an optical flow to camera motion and scene structure. Given two camera motions, we describe the domain where the two flows can be identical and the surfaces leading to these ambiguous flows.
... The system facilitates certain eye-to-eye contact and motion parallax effects. It is limited, however, to a 'head and shoulders' scene with a small-sized display and two-way point-to-point single-person communications [19]. Body gestures, postures and some socially important phenomena are not taken into account. ...
Article
An integrated media system is a computer-based environment that supports the creation, sharing, distribution and effective communication of multimodal information across the boundaries of space and time. The EU Information Societies Technology (IST) project — VIRTUE (Virtual Team User Environment) is working steadily towards the realisation of most aspects and properties of such a system, with particular emphasis on a three-way semi-immersive telepresence videoconferencing scenario. In contrast with the traditional videoconferencing system that we know now, the outcome of the project is expected to demonstrate distinctive presence features and experience for the conference participants. These include views of full-body-size realistically rendered images, eye-to-eye contact, gaze awareness, normal hand gesturing and direct body language. The purpose of this paper is to describe the current work in its related technical field, and the main objectives and scope of this project. One optional software system framework is outlined, and also illustrated are some component technologies in 3-D computer vision analysis that are being developed. The application of these component technologies, notably the dense-disparity estimation and the novel view synthesis, in 3-D interactive video manipulation and visualisation, are widely expected.
... The applications for such a system are obviously enormous. Immersive video conferencing can enhance the effectiveness of interpersonal communication [1][2], 3DTV and display systems increase the impact of news or movie and advertising345, and 3D mixed reality technique enables remote surgery or expert consultancy in the medical areas, and provides a means for remote maintenance in hazardous environment [6][7]. The most important problem in realizing these kinds of systems is to reconstruct 3D coordinate of captured scene. ...
Article
Full-text available
We propose a fast depth reconstruction algorithm for stereo sequences using camera geometry and disparity estimation. In disparity estimation process, we calculate dense background disparity fields in an initialization step so that only disparities of moving object regions are updated in the main process using real-time segmentation and hierarchical disparity estimation techniques. The estimated dense disparity fields are converted into depth information by camera geometry. Experimental results show that the proposed algorithm provides accurate depth information with an average processing speed of 15 frames/sec for 320x240 stereo sequences on a common PC. We also verified the performance of the proposed algorithm by applying it to real applications.
... We have investigated the I3D technique with different stereoscopic [14] and 3-camera sequences [17] of head-and-shoulder type, for which the assumption of convex object shape approximately holds true [9]. For disparity estimation and segmentation, we have used the system described in [3], which also exists as a hardware implementation [18]. For encoding, we have used the MPEG-4 software provided by MOMUSYS. ...
Article
Compression of stereoscopic and multiview video data is important, because the bandwidth necessary for storage and transmission linearly increase with the number of camera channels. This paper gives an overview about techniques that ISO's Moving Pictures Experts Group has defined in the MPEG- 2 and MPEG-4 standards, or that can be applied in the context of these standards. A good tradeoff between exploitation of spatial and temporal redundancies can be obtained by application of hybrid coding techniques, which combine motion-compensated prediction along the temporal axis, and 2D DCT transform coding within each image frame. The MPEG-2 multiview profile extends hybrid coding towards exploitation of inter-viewchannel redundancies by implicitly defining disparity-compensated prediction. The main feature of the new MPEG-4 multimedia standard with respect to video compression is the possibility to encode objects with arbitrary shape separately. As one component of the segmented object's shape, it shall be possible to encode a dense disparity map, which can be accurate enough to allow generation of alternative view s by projection. This way, a very high stereo/multiview compressions ratio can be achieved. While the main application area of the MPEG-2 multiview profile shall be in stereoscopic TV, it is expected that multiview aspects of MPEG-4 will play a major role in interactive applications, e.g. navigation through virtual 3D worlds with embedded natural video objects.
... Passive 3-D sensors generally use a two camera rig in conjunction with a stereo vision approach (for example 16,2,4,3 ). These systems are not currently in common use in a studio environment, mainly due to their restrictions in accuracy and robustness. ...
Article
Full-text available
Virtual production for broadcast is currently mainly used in the form of virtualstudios, where the resulting media is a sequence of 2-D images. With the steadyincrease of 3-D computing power in home PC's and the technical progress in 3-Ddisplay technology, the content industry is looking for new kinds of programmematerial, which makes use of 3-D technology. The applications range fromanalysis of sport scenes, 3DTV, up to the creation of fully immersive content.
... Ohm et al. describe a hardware system for stereoscopic videoconferencing over ATM networks [95]. The main focus of the paper is on disparity estimation and it is unclear whether the proposed system was ever implemented. ...
Article
Full-text available
An attempt is made to contribute to the realization of a flexible framework for video-mediated communication over the Internet by presenting scalable and adaptive algorithms for multicast flow control, layered video coding, and robust transport of video. Enrichments of video-mediated communication, in the shape of stereoscopic video transmission mechanisms and mobility support, are proposed along with design and implementation guidelines. Furthermore, the scope of Internet video is broadened through the introduction of a novel video gateway technology interconnecting multicast videoconferences with the World Wide Web.
... While this field of technology is relatively new, a hardware system for the real time stereoscopic video conferencing with viewpoint adaptation has been already realized [61]. ...
Article
Full-text available
SUMMARY This paper surveys the results of various stud- ies on 3-D image coding. Themes are focused on efficient com- pression and display-independent representation of 3-D images. Most of the works on 3-D image coding have been concentrated on the compression methods tuned for each of the 3-D image for- mats (stereo pairs, multi-view images, volumetric images, holo- grams and so on). For the compression of stereo images, several techniques concerned with the concept of disparity compensation have been developed. For the compression of multi-view images, the concepts of disparity compensation and epipolar plane image (EPI) are the efficient ways of exploiting redundancies between multiple views. These techniques, however, heavily depend on the limited camera configurations. In order to consider many other multi-view configurations and other types of 3-D images comprehensively, more general platform for the 3-D image repre- sentation is introduced, aiming to outgrow the framework of 3-D "image" communication and to open up a novel field of technol- ogy, which should be called the "spatial" communication. Espe- cially, the light ray based method has a wide range of application, including efficient transmission of the physical world, as well as
... Wide-baseline stereo has been investigated [4], but not necessarily in real-time applicative contexts. Several real-time stereo systems have been built around the parallel-camera configuration to minimise computational complexity (see [2], [5]). The proposed algorithm, the so-called hybrid recursive matching (HRM) provides sparse realtime disparity maps reduced by a factor of 8 with respect to full-size ITU-Rec. ...
Article
Klinefelter syndrome (46, XXY) is the most common cause of male hypogonadism, and it can be seen together with several endocrinologic diseases such as diabetes mellitus, osteoporosis, and various thyroid diseases. We presented a patient with Klinefelter syndrome and thyroid hemiagenesis, which is not reported previously.
... Hardware implementations for 3D applications either address classic stereoscopic video [14] or autostereoscopic systems. In the latter case, they mainly focus on twocamera systems [23], or they address the acceleration of multi-camera rendering [29]. ...
Article
Full-text available
Integral imaging is a promising technique for delivering high-quality three-dimensional content. However, the large amounts of data produced during acquisition prohibits direct transmission of Integral Image data. A number of highly efficient compression architectures are proposed today that outperform standard two-dimensional encoding schemes. However, critical issues regarding real-time compression for quality demanding applications are a primary concern to currently existing Integral Image encoders. In this work we propose a real-time FPGA-based encoder for Integral Image and integral video content transmission. The proposed encoder is based on a highly efficient compression algorithm used in Integral Imaging applications. Real-time performance is achieved by realizing a pipelined architecture, taking into account the specific structure of an Integral Image. The required memory access operations are minimized by adopting a systolic concept of data flow through the core processing elements, further increasing the performance boost. The encoder targets, real-time, broadcast-type high-resolution Integral Image and video sequences and performs three orders of magnitude faster than the analogous software approach. KeywordsIntegral imaging-Disparity estimation-3D image compression-FPGA-Real-time processing
... Sometimes, disparity has the same meaning as depth, because depth can be simply calculated using disparity and basic depth information is included in the disparity. A great deal of computer vision research has addressed this problem, because disparity contains useful information for various image applications including object recognition, inspection, manipulation [1], stereo sequence coding [2], and intermediate view generation [3], etc. So far, many active and passive methods have been used to recover depth information [4]. ...
Article
A new stereo matching scheme using a genetic algorithm is presented to improve the depth reconstruction method of stereo vision systems. Genetic algorithms are efficient search methods based on principles of population genetics, i.e. mating, chromosome crossover, gene mutation, and natural selection. The proposed approach considers the matching environment as an optimization problem and finds the optimal solution by using an evolutionary strategy. Accordingly, genetic operators are adapted for the circumstances of stereo matching: (1) an individual is a disparity set, (2) a chromosome has a 2D structure for handling image signals efficiently, and (3) a fitness function is composed of certain constraints which are commonly used in stereo matching. Since the fitness function consists of intensity, similarity and disparity smoothness, the matching and relaxation processes are considered at the same time in each generation. In order to acquire a disparity map consistent with the image appearance, a region of the input image, divided by zero-crossing points, is extracted and used in the determination of the chromosome shape. As a result, all chromosomes contain the external image form, and the disparity output coincides with the input image without any modification of the matching algorithm. In addition, an informed gene generation based on intensity difference is applied to reduce the searching space of the genetic operations.
... Many different architectures based on the MPEG compression scheme have been proposed for the implementation of video compression modules taking advantage of specific target devices. Recent research interest can be grouped to three main domains: A hardware implementation of a system for real-time stereoscopic videoconferencing with viewpoint adaptation is described by Ohm et al. [17]. In this system intermediate views at arbitrary positions are synthesized from the views of a stereoscopic camera system. ...
Article
This paper presents a novel hardware implementation of a disparity estimation scheme targeted to real-time Integral Photography (IP) image and video sequence compression. The software developed for IP image compression achieves high quality ratios over classic methodologies by exploiting the inherent redundancy that is present in IP images. However, there are certain time constraints to the software approach that must be confronted in order to address real-time applications. Our main effort is to achieve real-time performance by implementing in hardware the most time-consuming parts of the compression algorithm. The proposed novel digital architecture features minimized memory read operations and extensive simultaneous processing, while taking into concern the memory and data bandwidth limitations of a single FPGA implementation. Our results demonstrate that the implemented hardware system can successfully process high resolution IP video sequences in real-time, addressing a vast range of applications, from mobile systems to demanding desktop displays.
... In the literature, most of the multiview encoders that are compatible with existing standards are based on MPEG-2. The DISTIMA [2] project developed a system for capturing, coding, transmitting, and presenting digital stereoscopic image sequences, and the PANORAMA [3] project enhanced the visual information exchange with 3-D telepresence. Both can be integrated with MPEG-2. ...
Article
Intermediate virtual images are used in the evaluation of disparity estimations. The analysis is based on the effect of disparity inaccuracies in the relative quality of virtual images. The peak-signal-to-noise ratio and the percentage of visual errors are used to assess the quality of virtual images generated with distorted disparity fields. Computer simulation results show that the peak-signal-to-noise ratio is more affected by a small magnitude of perturbations - two or three pixels - than by a high frequency of perturbations -15% or 20%. However, the percentage of visual errors is more affected by the frequency of perturbations. By using distorted disparity fields, the deterioration in the quality of virtual images is imperceptible to the human eye.
Article
To solve the problems of looking around within a three dimensional scene by freely choosing views and the difficulties of filming stereoscopic scene with multi-view, this paper presented an intermediate view synthesis algorithm. The algorithm originally extracted feature points using the edges of objects in the image, then set up Delaunay triangle mesh using the feature points and calculated disparity vector field by mesh mapping, and finally generated intermediate view image through interpolation algorithm. Experimental results show that intermediate view image has good quality, and it can rightly reflect the content of the original views. It also has a good visual effect.
Article
A new development in stereo image coding research is the application of a deformable mesh. Based on the Delaunay triangular mesh stereo image coding algorithms feature-node selection method was proposed in this paper, which made full use of the edge of objects to select feature nodes. At the establishment of the Delaunay triangular grid, the objects could be parted according to their edges, so as to better reflect the complex movement and deformation of the objects. PSNR value and observations show that the proposed method is superior to the traditional ones, reflecting the details of movement and deformation of the objects much better.
Article
3Dgraphics technology is one of the important research direction in the field of graphicscurrently. 3Dgraphics technology has very broad application prospects, and it hasvery important application value in the field of scientificresearch, military, education, industry, medical, entertainment and many otherfields. But the data volume of high resolution 3D graphics is huge, and the storage, transmission and real-time display higher requirements are put forward.In recent years, people have shown strong interest in the research of 3Dgraphics compression techniques. In this article, the 3Dgraphics compression technology and the combination of industrial flameare be combined to introduces a compression method for three-dimensional flame.
Article
We investigate the design of an interpolation filter of a MF-TDMA demodulator which is applied to DVB-RCS. If sampling is not synchronized to the data symbols, timing adjustment in digital receiver must be performed by interpolation. It is impossible that conventional sinc interpolation filter coefficients are actually extended to infinity. We propose a Kaiser window interpolation filter and a sinc interpolation filter using th Kaiser window. Simulation results show that the performance improvement is realized by employing the proposed interpolation filter.
Article
Full-text available
We propose a fast disparity estimation algorithm using background registration and object segmentation for stereo sequences from fixed cameras. Dense background disparity information is calculated in an initialization step, so that only disparities of moving object regions are updated in the main process. We propose a real-time segmentation technique using background subtraction and interframe differences, and a hierarchical disparity estimation using a region-dividing technique and shape-adaptive matching windows. Experimental results show that the proposed algorithm provides accurate disparity vector fields with an average processing speed of 15 frames/s for 320×240 stereo sequences on an ordinary PC.
Article
Full-text available
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.
Article
A simple stereo matching algorithm using population-based incremental learning(PBIL) is proposed in this paper to decrease the general problem of genetic algorithms, such as memory consumption and inefficiency of search. PBIL is a variation of genetic algorithms using stochastic search and competitive teaming based on a probability vector. The structure of PBIL is simpler than that of other genetic algorithm families, such as serial and parallel ones, due to the use of a probability vector. The PBIL strategy is simplified and adapted for stereo matching circumstances. Thus, gene pool, chromosome crossover, and gene mutation we removed, while the evolution rule, that fitter chromosomes should have higher survival probabilities, is preserved. As a result, memory space is decreased, matching rules are simplified and computation cost is reduced. In addition, a scheme controlling the distance of neighbors for disparity smoothness is inserted to obtain a wide-area consistency of disparities, like a result of coarse-to-fine matchers. Because of this scheme, the proposed algorithm can produce a stable disparity map with a small fixed-size window. Finally, an alterative version of the proposed algorithm without using probability vector is also presented for simpler set-ups.
Article
This article is about development of mathematical model and hardware implementation of the digital filter for skin features identification. With increasing popularity of digital video the computer vision is becoming one of the common media in mainstream electronics. One of the general tasks that are required by consumer computer vision systems is image understanding like detection and tracking of people. Before those tasks can be performed, the images have to be preprocessed. Suchlike tasks can be very cumbersome for general purpose computers. In such cases the skin color can be a very comprehensive feature to achieve the goal. The core of color tracking is color based image segmentation. Introduced parametrical digital filter makes it possible to filter digital images in YCbCr color space and on its output it gives digital images in a binary mask fashion. As the primary purpose of this digital filter is to identify skin color tones in digital images that is the main preprocessing task in detection and tracking of skin color formations, the parametrical design of the filter also offers color detection at comprehensive sense. In contrast to other classical digital filters known for digital audio and video signal processing with convolution equation, this filter features a multitude of threshold functions as transfer function. Appropriated thresholds are defined in HSV color space for the value and saturation components where the hue attribute is reliable. Therefore we use a polyhedron with appropriate threshold values that correspond to the skin-colored clusters with well-defined saturation and value components, based on a large sample set (16). Each pixel from the digital image that passes through the filter is labeled with appropriate binary information. This binary information creates a binary mask with the same size as the original image where a logical one points to skin-colored pixel. As the applicable level of shown digital filter extends to digital image preprocessing the low level hardware implementation is very suitable. The hardware implementation is shown with FPSLIC technology from Atmel that features high performance combination of FPGA programmable logic circuits and RISC microcontrollers. The given digital filter structure is suitable for low-level non-temporal preprocessing tasks that operate on pixel level. Shown architecture of the digital filter offers sufficient computing power or data bandwidth to satisfy the requirements for full resolution real-time image processing in standard PAL picture formats. For hardware implementation we developed a mathematical model that enables calculation of the digital filter parameters from given boundary values in HSV color space. The digital filter hardware is completely designed in FPGA part of FPSLIC circuit. The embedded FPSLIC RISC microcontroller features parameter
Article
Full-text available
This paper presents a real-time entropy compression/decompression unit for disparity map information used in 3D teleconferencing systems. The disparity map data form a constant bit rate data stream which has to be transmitted through an ATM channel supporting lower data rates. The selection of the proper compression algorithm must be based on the durability of the regenerated data to various types of errors, generated mainly due to the limited available bandwidth. Initially we present the disparity map formats, why they should be coded losslessly, some well-known entropy algorithms and their performance in terms of compression rate and throughput. All algorithms are evaluated according to the application requirements which are: good compression rate, real-time implementability and fast convergence during the compression initialization phase. Feasible implementation are proposed for the algorithms selected using commercially available digital signal processors (DSPs) and field programmable gate arrays (FPGAs).
Chapter
IntroductionThe Meaning of Telepresence in VideoconferencingMulti-party Communication Using the Shared Table ConceptExperimental Systems for Immersive VideoconferencingPerspective and TrendsAcknowledgementsReferences
Article
This paper discusses an approach to 3D-Television that is based on the Layered Depth Video (LDV) format. The LDV format contains explicit depth and occlusion information, allowing for the generation of novel viewpoints for stereoscopic and auto-stereoscopic multi-view displays. Thus, the format is effectively invariant to the display type and also allows the depth impression to be easily changed to best meet viewers' preferences for visual comfort. The major aspects of a content delivery chain based on the LDV format are discussed in this paper. The requirements placed on data acquisition are introduced, and a multi-camera system, which is well suited for LDV compliant data capture, is presented. Also discussed is the conversion of different input data streams, like standard stereo videos, multi-view data supplemented by depth data, and videos from wide baseline setups, to the LDV format. Moreover, the advantages of the LDV format in editing and mixing are examined. The paper also presents a transmission system based on currently available coding and transmission standards. Optimization of the bandwidth via different approaches to the compression of the LDV signal is analyzed, and the results of conducted experiments in this field are discussed. Finally, the aspects of perceptual human factors for the proper evaluation of 3D-TV services and the implemented LDV system are examined. This contribution reflects the efforts of the EU-funded project 3D4YOU to unify all aspects of 3D-TV production.
Conference Paper
Current trends in digital display technology show a marked interest towards 3D displays, which allow three dimensional images to be conveyed to viewers. Among various 3D display techniques, auto stereoscopic display appears to be promising due to the use of optical trickery at the display, allowing glass-free viewing. However, the cost of generating and transmitting the auto stereoscopic images is usually quite high due to the huge amount of data. Hence, it is challenging to acquire artifact-free 3D images in real-time. This paper presents a system to generate and display high-resolution auto stereoscopic images at full HD resolution, i.e., 1920↗1080↗24. We show that the video-plus-depth data representation enables a scalable system architecture and efficient data transition. The proposed GPU accelerated depth image-based rendering (DIBR) algorithm and multiplexing algorithm are able to synthesize auto stereoscopic images in real-time. The synthesized images are then displayed on the auto stereoscopic screen that is mounted on a conventional LCD monitor. We demonstrate our system to both indoor activities and natural scenes.
Article
In the last years two European projects – DISTIMA and PANORAMA – were working on stereoscopic and three-dimensional (3-D) video. While DISTIMA made it possible to transmit stereoscopic video compatible to MPEG 2 in real time, PANORAMA develops a 3-D video system: viewpoint adaptive visualisation of scenes providing look-around-capability. This paper describes selected parts of the work performed in the two projects: the DISTIMA real-time hardware for the MPEG2 compatible transmission of stereoscopic video, the DISTIMA activities in the area of region and object-based stereoscopic coding, the PANORAMA real-time hardware development for 3-D video based on disparity estimation and disparity compensated interpolation of intermediate views and the PANORAMA software development for 3-D video based on 3-D reconstruction and 3-D computer graphics.
Conference Paper
We present results of a comparative study of different fast disparity analysis approaches. Two of them are well known standard algorithms, while the third is a new approach based on a hybrid block- and pixel-recursive matching scheme. The key idea of the new algorithm is to choose efficiently a small number of candidate vectors in order to reduce the computational effort by simultaneously achieving spatial and temporal consistency in the resulting disparity map. The latter aspect is very important for D video conferencing applications, where novel views of the conferee have to be synthesised in order to provide motion parallax. For this application, processing of a video in ITU-Rec. 601 resolution is required. Our new algorithm is able to provide disparity vector fields for both directions (left→right and right→left) in real-time on one 800 MHz Pentium III in reasonable quality. The different disparity algorithms are compared with respect to reliability, quality of the resulting disparities and speed of the algorithm
Conference Paper
In diesem Beitrag werden Ergebnisse einer vergleichenden Untersuchung zur Disparitätsanalyse von konvergenten Stereosystemen vorgestellt. Bei bekannter Epipolargeometrie kann die Disparitätsanalyse auf den Originalansichten durch Schätzung entlang der Epipolarlinie durchgeführt werden. Wendet man jedoch auf diese Originalansichten eine Rektifikation an, so ergeben sich achsparallele Ansichten und eine Schätzung entlang der horizontalen Scan-Linie ist möglich. Bei der Bewertung beider Verfahren wird der jeweils notwendige Rechenaufwand und die Güte der Disparitätsschätzung anhand verschiedener Kriterien analysiert.
Conference Paper
This paper introduces a new form of representation for three-dimensional video objects. We have developed a technique to extract disparity and texture data from video objects, that are captured simultaneously with multiple-camera configurations. As a result, we obtain the video object plane as an unwrapped surface of a 31) object, containing all texture data visible from any of the cameras. This texture surface can be encoded like any 2D video object plane, while the 3D information is contained in the associated disparity map. It is then possible to reconstruct different viewpoints from the texture surface by simple disparity-based projection. The merits of the technique are efficient multiview encoding of single video objects, and support for viewpoint adaptation functionality, which is desirable in mixing natural and synthetic images. We have performed experiments with the MPEG-4 video verification model, where the disparity map is encoded by use of the tools provided for grayscale alpha data encoding. Due to its simplicity, the technique is capable for applications with requirement for realtime viewpoint adaptation towards video objects.
Article
Due to enormous progress in the areas of auto-stereoscopic 3D displays, digital video broadcast and computer vision algorithms, 3D television (3DTV) has reached a high technical maturity and many people now believe in its readiness for marketing. Experimental prototypes of entire 3DTV processing chains have been demonstrated successfully during the last few years, and the motion picture experts group (MPEG) of ISO/IEC has launched related ad hoc groups and standardization efforts envisaging the emerging market segment of 3DTV. In this context the paper discusses an advanced approach for a 3DTV service, which is based on the concept of video-plus-depth data representations. It particularly considers aspects of interoperability and multi-view adaptation for the case that different multi-baseline geometries are used for multi-view capturing and 3D display. Furthermore it presents algorithmic solutions for the creation of depth maps and depth image-based rendering related to this framework of multi-view adaptation. In contrast to other proposals, which are more focused on specialized configurations, the underlying approach provides a modular and flexible system architecture supporting a wide range of multi-view structures. r 2007 Elsevier B.V. All rights reserved.
Article
In this paper we present a visual input HCI system for wearable computers, the FingerMouse. It is a fully integrated stereo camera and vision processing system, with a specifically designed ASIC performing stereo block matching at 5 Mpixel/s (e.g. QVGA 320 × 240 at 30 fps) and a disparity range of 47, consuming 187 mW (78 mW in the ASIC). It is button-sized (43 mm × 18 mm) and can be worn on the body, capturing the user’s hand and processing in real-time its coordinates as well as a 1-bit image of the hand segmented from the background. Alternatively, the system serves as a smart depth camera, delivering foreground segmentation and tracking, depth maps and standard images, with a processing latency smaller than 1 ms. This paper describes the FingerMouse functionality and its applications, and how the specific architecture outperforms other systems in size, latency and power consumption.
Article
In this paper, stereo image reconstruction using regularized adaptive disparity estimation is proposed. That is, by adaptively predicting the mutual correlation between stereo images pair using the proposed algorithm, the bandwidth of stereo input images pair can be compressed to the level of a conventional 2D image and a predicted image also can be effectively reconstructed using a reference image and disparity vectors. Especially, in the proposed algorithm, the first feature values are extracted from the input stereo images pair. Then, a matching window for stereo matching is adaptively selected depending on the magnitude of these feature values. That is, for the region having larger feature values, a smaller matching window is selected, while, for the opposite case, a larger matching window is selected by comparing predetermined threshold values. This approach is not only able to reduce a mismatching of disparity vectors, which occurs in the conventional dense disparity estimation with a small matching window, but is also able to reduce blocking effects that occur in the coarse disparity estimation with a large matching window. In addition, in this paper, a new regularized adaptive disparity estimation technique is proposed. That is, by regularizing the estimated disparity vector with the neighboring disparity vectors, problems of the conventional adaptive disparity estimation scheme might be solved, and the predicted stereo image can be more effectively reconstructed. From experiments using stereo sequences of "Man", "Fichier", "Manege", and "Tunnel", it is shown that the proposed algorithm improves the PSNRs of a reconstructed image to about 6.90 dB on average at ±30 search ranges as compared to those of conventional algorithms. Also, it is found that there is almost no difference between an original image and a reconstructed image through the proposed algorithm by comparison to that of conventional algorithms.
Article
"Telepresence" is an interesting field that includes virtual reality implementations with human-system interfaces, communication technologies, and robotics. This paper describes the development of a telepresence robot called Telepresence Robot for Interpersonal Communication (TRIC) for the purpose of interpersonal communication with the elderly in a home environment. The main aim behind TRIC's development is to allow elderly populations to remain in their home environments, while loved ones and caregivers are able to maintain a higher level of communication and monitoring than via traditional methods. TRIC aims to be a low-cost, lightweight robot, which can be easily implemented in the home environment. Under this goal, decisions on the design elements included are discussed. In particular, the implementation of key autonomous behaviors in TRIC to increase the user's capability of projection of self and operation of the telepresence robot, in addition to increasing the interactive capability of the participant as a dialogist are emphasized. The technical development and integration of the modules in TRIC, as well as human factors considerations are then described. Preliminary functional tests show that new users were able to effectively navigate TRIC and easily locate visual targets. Finally the future developments of TRIC, especially the possibility of using TRIC for home tele-health monitoring and tele-homecare visits are discussed.
Conference Paper
A blind image restoration method based on the genetic algorithm (GA) and the fuzzy control (FC) is proposed for the situation that the type of the space-variant point spread function (PSF) is unknown and the additive noise is serious. The image is divided into blocks by the triangle meshes. The standard genetic algorithm (SGA) and the micro-population genetic algorithm (micro-PGA) are used alternately to estimate the PSFs and their corresponding image blocks, respectively. At the end of each evolutionary generation of SGA, the best estimated PSF is corrected by the parametric models through FC, so does the best estimated image block according to the histogram statistics after each iteration of micro-PGA. Experiment results show that the presented method can restore the space variant blurred images effectively, and its power of suppressing noise is strong also.
Conference Paper
In this paper, a fast depth estimation method using arbitrary configured stereo vision is proposed. The key idea of the method is to use wavelet transform modulus maxima as feature points in the process of epipolar line rectification and image pair matching. Wavelet transform modulus maxima are first extracted for the image pair at coarse scale. Then stereo rectification process is implemented only on these maxima. After that, the stereo matching is calculated based on the extracted modulus maxima. These steps are iterated at finer scales, where the searching intervals of the stereo matching points are limited to the neighborhood of the matches at former scale. At last, as an optional step, the dense disparity map can be produced by traditional method using the former obtained sparse disparity map information. Experiment on the corridor image pair proves the validity of the method.
Conference Paper
In this paper, a hybrid stereo matching algorithm which is based on feature and area process (HAFA) is presented. At first edge features are extracted and matched using wavelet transform modulus maxima representation to get a sparse disparity map. In this step, edge points are detected by getting the maxima modulus of the wavelet transform of the stereo image. At coarse scales, the local maxima of modules have different positions and only detected sharp edges because of the smoothing of the images. At fine scale, there are many maxima created by the image noise. We get rid of the influence of the noise by adding a threshold to the process of finding the maxima. And we introduce the normalized cross correlation criteria into the discrete wavelet transform domain to match the features. Then pixels outside the edge are matched using area-based algorithm under the constraint of the acquired disparity map. The HAFA can inherit the accuracy of feature-based approaches and can simplify the searching procedure of area-based method. The dense disparity map produced by the HAFA indicate that this algorithm is effective and of great value
Conference Paper
The paper proposes a stereoscopic and multiview video sequences encoder compatible with MPEG-4. The main view of the sequences is encoded as an MPEG-4 bitstream and the auxiliary views are encoded by joint motion and disparity compensation, so that both temporal and spatial redundancy are reduced and high compression is achieved. The disparity and motion fields for both main and auxiliary views are calculated using a joint disparity-motion estimation and regularization algorithm. We test the proposed encoder for coding stereoscopic (2-view) and 3-view video sequences and compare it with the MPEG multi-view profile (MVP) implemented on the MPEG-4 platform. The proposed system achieves higher image quality at the same bitrate than existing multiview video encoders and is suitable for various applications, such as video-conferencing and 3D telepresence systems.
Conference Paper
This paper develops a novel computational framework for an accurate, robust, and efficient stereo analysis, which is a hierarchical scheme based on the combination of feature- and area-based matching, called hierarchical hybrid matching algorithm (HHM). The HHM can inherit the accuracy of the feature-based approaches, simplify the procedure of feature matching and produce robust and dense disparity fields. What is more, the HHM can reduce the dependency on the quantity and quality of the features extracted from the original images to a great degree. On the other hand, unlike existing area-based approaches the HHM can generate global disparity fields, produce more precise disparity estimates on the edge of the objects and avoid searching blindly in a wide range. Experimentations are carried out on a group of widely used stereo images
Conference Paper
Viewpoint adaptation, or “look-around” capability, is likely to be vital to the success of future immersive multimedia services such as 3D video-conferencing, virtual reality and 3DTV. This paper proposes a low complexity and efficient compression system for multiview images using quadtree disparity estimation and multiview synthesis. Previous approaches to the problem have generally involved the transmission of a coded stereo image pair, usually using fixed block size disparity compensation, along with a pair of dense disparity maps used for intermediate view synthesis. In this work, the quadtree disparity map coded implicitly in the stereo compression process is re-used in the multiview synthesis process. The other map is also efficiently represented using a quadtree structure at low resolution. Improved multiview compression results are presented, along with intermediate views synthesised at the decoder
Article
Stereo vision provides a direct way of inferring the depth information by using two images destined for the left and right eye respectively. A stereoscopic pair of image sequences, recorded with a difference in the view angle, allows the 3D perception of the scene by the human observer by exposing the respective image sequence to each eye. The effectiveness of object-based coding for stereoscopic and 3D image sequences is discussed. It alleviate the problem of annoying coding errors, providing a more natural representation of the scene, but they require a complex analysis phase to segment the scene into objects and estimate their motion and structure.
Article
Full-text available
The estimation of correspondences in natural image pairs plays an important role in a large number of applications such as video coding, frame rate conversion, multi-viewpoint image generation, camera calibration, 3D from stereo, and structure from motion. An overview of the techniques for dense geometric correspondence estimation is presented. Different types of image pairs are discussed. Some improvements for correspondence estimation in image pairs are projected, which include, the estimation of all pseudo-correspondences, the incorporation of image restoration models, modeling of specular reflectivity of scene surfaces, the use of image sequences, and the application of epipolar geometry.
Article
A new studio system for the production of three-dimensional (3-D) content is introduced. The system combines the ability to capture dynamic scenes, based on a multicamera system in a chroma-key environment, with a view-dependent projection for actor feedback. The system allows the generation and rendering of 3-D models in preview quality for on-set visualization in real time and in high quality for postproduction applications in an offline phase. The shape reconstruction is based on the computation of the visual hull. The view-dependent projection component allows actors to be immersed into a virtual scene and to interact with virtual components. The functional modules of the system were implemented in a highly distributed system using standard, inexpensive IT components. Therefore, the system is quite scalable and can be fitted into most studio environments. For the integration of the system components, a middleware architecture was developed.
Article
Full-text available
In this paper we present a combination of three steps to code a disparity map for 3D teleconferencing applications. First we introduce a new disparity map format, the chain map, which has a very low inherent redundancy. Additional advantages of this map are: one single bidirectional map in stead of the usual two unidirectional vector fields, explicit indication of occlusions, no upper or lower bound on disparity values, no disparity offset, easy generation by disparity estimators and easy interpretation by image interpolators. In a second step, we apply data reduction on the chain map. The reduction is a factor two, thereby losing explicit information about the position of occlusion areas. An algorithm for image interpolation in absence of occlusion information is presented. The third step involves entropy coding, both lossless and lossy. A scheme specially suited for the chain map has been developed. Although the codec is based on a simple prediction process without motion compensatio...
Article
Full-text available
This paper describes algorithms that were developed for a stereoscopic videoconferencing system with viewpoint adaptation. The system identifies foreground and background regions, and applies disparity estimation to the foreground object, namely the person sitting in front of a stereoscopic camera system with rather large baseline. A hierarchical block matching algorithm is employed for this purpose, which takes into account the position of high-variance feature points and the object/background border positions. Using the disparity estimator's output, it is possible to generate arbitrary intermediate views from the left- and right-view images. We have developed an object-based interpolation algorithm, which produces high-quality results. It takes into account the fact that a person's face has a more or less convex surface. Interpolation weights are derived both from the position of the intermediate view, and from the position of a specific point within the face. The algorithms have be...
Article
An algorithm and its implementation to compute the surface map of a scene from a stereo pair of mages are described. This algorithm differs from other stereo algorithms in that it uses smoothness of the resulting surface as a criterion for matching. Planar and quadratic patches are used as local models of the suface. A multiresolution hierarchy of surface maps is generated, starting from the coarse and progressing towards the fine. At each stage the surface interpolation process takes into account the detected occluding and ridge contours in the scene, which are places where depth and orientation change abruptly. Interpolation is performed within regions enclosed by these contours. Occluded regions in the image are identified and are not used for matching and interpolation. The algorithm assumes that surfaces in the real world are smooth and continuous except across relatively rare occluding and ridge contours.
Article
The digital stereo image processing literature is briefly reviewed, and an automatic system developed for digital matching of points in aerial stereo imagery is described. The system uses area-based correlation, but couples this very basic measure of match with a variety of novel search techniques to develop a disparity model for a given stereo image pair. The techniques used are hierarchical in nature, incorporate iterative refinement, and use a best-first strategy in the matching process. -from Author
Article
We propose a head-tracking autostereoscopic display system based on magnetic-sensor tracking of the viewer's side-to-side location, and optical slewing of a stereoscopic image-pair array projected onto a lenticular screen so as to keep the images received by the viewer's eyes distinct. Viewer distance changes are accommodated by slight magnification changes of the projected image array. A high-definition-TV LCD projector is used with a 178-cm (70-in.) lenticular sheet (diagonal measurement) to provide impressive computer-generated 3D images over a particularly wide viewing zone. The system finds application in a virtual-space teleconferencing system that requires a large-scale stereoscopic display without the use of special viewing glasses.
Article
MOTION IS AN IMPORTANT AND FUNDAMENTAL SOURCE OF VISUAL INFORMATION. IT IS WELL KNOWN THAT THE PATTERN OF IMAGE MOTION CONTAINS INFORMATION USEFUL FOR THE DETERMINATION OF THE 3-DIMENSIONAL STRUCTURE OF THE ENVIRONMENT AND THE RELATIVE MOTION BETWEEN THE CAMERA AND THE OBJECTS IN THE SCENE. HOWEVER, THE ACCURATE MEASUREMENT OF IMAGE MOTION FROM A SEQUENCE OF REAL IMAGES HAS PROVEN TO BE DIFFICULT. IN THIS THESIS, A HIERARCHICAL FRAMEWORK FOR THE COMPUTATION OF DENSE DISPLACEMENT FIELDS FROM PAIRS OF IMAGES, AND AN INTEGRATED SYSTEM CONSIS- TENT WITH THAT FRAMEWORK ARE DESCRIBED. EACH INPUT INTENSITY IMAGE IS FIRST DECOMPOSED USING A SET OF SPATIAL-FREQUENCY TUNED CHANNELS. THE INFORMATION IN THE LOW-FREQUENCY CHANNELS IS USED TO PROVIDE ROUGH DISPLACEMENTS OVER A LARGE RANGE, WHICH ARE THEN SUCCESSIVELY REFINED BY USING THE INFORMATION IN THE HIGHER-FREQUENCY CHANNELS. WITHIN EACH CHANNEL, A DIRECTION- DEPENDENT CONFIDENCE MEASURE IS COMPUTED FOR EACH DISPLACEMENT VECTOR, AND A SMOOTHNESS CONSTRAINT IS USED TO PROPOGATE RELIABLE DISPLACEMENT VECTORS TO THEIR NEIGHBORING AREAS WITH LESS RELIABLE VECTORS. FOR OUR INTEGRATED SYSTEM, BURT''S LAPLACIAN PYRAMID TRANSFORM IS USED FOR THE SPATIAL-FREQUENCY DECOMPOSITION, AND THE MINIMIZATION OF THE SUM OF SQUARED DIFFERENCES MEASURE (SSD) IS USED AS THE MATCH CRITERION. THE CON- FIDENCE MEASURE IS DERIVED FROM THE SHAPE OF THE SSD SURFACE, AND THE SMOOT
Article
An approach is described that integrates the processes of feature matching, contour detection, and surface interpolation to determine the three-dimensional distance, or depth, of objects from a stereo pair of images. Integration is necessary to ensure that the detected surfaces are smooth. Surface interpolation takes into account detected occluding and ridge contours in the scene; interpolation is performed within regions enclosed by these contours. Planar and quadratic patches are used as local models of the surface. Occluded regions in the image are identified, and are not used for matching and interpolation. A coarse-to-fine algorithm is presented that generates a multiresolution hierarchy of surface maps, one at each level of resolution. Experimental results are given for a variety of stereo images.
Chapter
First basic Standards for videoconferencing services have been specified several years ago and the European Videoconferencing Service (EVS) was launched in 1989. However, the uptake of videoconferencing services was less enthusiastic than predicted by some experts. Apart from high costs and low availability, one of the reasons for the poor uptake might be that some aspects of face-to-face meetings cannot be represented sufficiently well through the use of standard videoconferencing equipment. Considering this, a research project was initiated to investigate the advantages of new technologies such as stereoscopic video in videoconferencing services in terms of Telepresence. A communication system is said to have a higher degree of Telepresence the more it gives the distributed participants a feeling of sharing space with their remote partners. This paper deals with the benefits of stereoscopic image representation and individual perspective in interpersonal communications applications.
Article
This paper presents a stereo matching algorithm using the dynamic programming technique. The stereo matching problem, that is, obtaining a correspondence between right and left images, can be cast as a search problem. When a pair of stereo images is rectified, pairs of corresponding points can be searched for within the same scanlines. We call this search intra-scanline search. This intra-scanline search can be treated as the problem of finding a matching path on a two-dimensional (2D) search plane whose axes are the right and left scanlines. Vertically connected edges in the images provide consistency constraints across the 2D search planes. Inter-scanline search in a three-dimensional (3D) search space, which is a stack of the 2D search planes, is needed to utilize this constraint. Our stereo matching algorithm uses edge-delimited intervals as elements to be matched, and employs the above mentioned two searches: one is inter-scanline search for possible correspondences of connected edges in right and left images and the other is intra-scanline search for correspondences of edge-delimited intervals on each scanline pair. Dynamic programming is used for both searches which proceed simultaneously: the former supplies the consistency constraint to the latter while the latter supplies the matching score to the former. An interval-based similarity metric is used to compute the score. The algorithm has been tested with different types of images including urban aerial images, synthesized images, and block scenes, and its computational requirement has been discussed.
Conference Paper
This paper is a condensed version of the author's thesis [Bolles 1976], which investigates a subclass of visual information processing referred to as verification vision (abbreviated VV). VV uses a model of a scene to locate objects of interest in a ...
Conference Paper
The past few years have seen a growing interest in the application" of three-dimensional image processing. With the increasing demand for 3-D spatial information for tasks of passive navigation [7,12], automatic surveillance [9], aerial cartography [10,13], and inspection in industrial automation, the importance of effective stereo analysis has been made quite clear. A particular challenge is to provide reliable and accurate depth data for input to object or terrain modelling systems (such as [5]. This paper describes an algorithm for such stereo sensing It uses an edge-based line-by-line stereo correlation scheme, and appears to be fast, robust, and parallel implementable. The processing consists of extracting edge descriptions for a stereo pair of images, linking these edges to their nearest neighbors to obtain the edge connectivity structure, correlating the edge descriptions on the basis of local edge properties, then cooperatively removmg those edge correspondences determined to be in error - those which violate the connectivity structure of the two images. A further correlation process, using a technique similar to that used for the edges, is applied to the image intensity values over intervals defined by the previous correlation The result of the processing is a full image array disparity map of the scene viewed.
Article
A stereo algorithm is presented that optimizes a maximum likelihood cost function. The maximum likelihood cost function assumes that corresponding features in the left and right images are normally distributed about a common true value and consists of a weighted squared error term if two features are matched or a (fixed) cost if a feature is determined to be occluded. The stereo algorithm finds the set of correspondences that maximize the cost function subject to ordering and uniqueness constraints. The stereo algorithm is independent of the matching primitives. However, for the experiments described in this paper, matching is performed on the cf4individual pixel intensities.cf3 Contrary to popular belief, the pixel-based stereo appears to be robust for a variety of images. It also has the advantages of (i) providing adensedisparity map, (ii) requiringnofeature extraction, and (iii)avoidingthe adaptive windowing problem of area-based correlation methods. Because feature extraction and windowing are unnecessary, a very fast implementation is possible. Experimental results reveal that good stereo correspondences can be found using only ordering and uniqueness constraints, i.e., withoutlocalsmoothness constraints. However, it is shown that the original maximum likelihood stereo algorithm exhibits multiple global minima. The dynamic programming algorithm is guaranteed to find one, but not necessarily the same one for each epipolar scanline, causing erroneous correspondences which are visible as small local differences between neighboring scanlines. Traditionally, regularization, which modifies the original cost function, has been applied to the problem of multiple global minima. We developed several variants of the algorithm that avoid classical regularization while imposing several global cohesiveness constraints. We believe this is a novel approach that has the advantage of guaranteeing that solutions minimize the original cost function and preserve discontinuities. The constraints are based on minimizing the total number of horizontal and/or vertical discontinuities along and/or between adjacent epipolar lines, and local smoothing is avoided. Experiments reveal that minimizing the sum of the horizontal and vertical discontinuities provides the most accurate results. A high percentage of correct matches and very little smearing of depth discontinuities are obtained. An alternative to imposing cohesiveness constraints to reduce the correspondence ambiguities is to use more than two cameras. We therefore extend the two camera maximum likelihood toNcameras. TheN-camera stereo algorithm determines the “best” set of correspondences between a given pair of cameras, referred to as the principal cameras. Knowledge of the relative positions of the cameras allows the 3D point hypothesized by an assumed correspondence of two features in the principal pair to be projected onto the image plane of the remainingN− 2 cameras. TheseN− 2 points are then used to verify proposed matches. Not only does the algorithm explicitly model occlusion between features of the principal pair, but the possibility of occlusions in theN− 2 additional views is also modeled. Previous work did not model this occlusion process, the benefits and importance of which are experimentally verified. Like other multiframe stereo algorithms, the computational and memory costs of this approach increase linearly with each additional view. Experimental results are shown for two outdoor scenes. It is clearly demonstrated that the number of correspondence errors is significantly reduced as the number of views/cameras is increased.
Article
Thesis (Ph. D.)--University of Illinois, 1981. Includes bibliographical references (leaves 111-115). Photocopy.
Article
A computational approach to image matching is described. It uses multiple attributes associated with each image point to yield a generally overdetermined system of constraints, taking into account possible structural discontinuities and occlusions. In the algorithm implemented, intensity, edgeness, and cornerness attributes are used in conjunction with the constraints arising from intraregional smoothness, field continuity and discontinuity, and occlusions to compute dense displacement fields and occlusion maps along the pixel grids. The intensity, edgeness, and cornerness are invariant under rigid motion in the image plane. In order to cope with large disparities, a multiresolution multigrid structure is employed. Coarser level edgeness and cornerness measures are obtained by blurring the finer level measures. The algorithm has been tested on real-world scenes with depth discontinuities and occlusions. A special case of two-view matching is stereo matching, where the motion between two images is known. The algorithm can be easily specialized to perform stereo matching using the epipolar constraint
Article
Compressibility of individual sequences by the class of generalized finite-state information-lossless encoders is investigated. These encoders can operate in a variable-rate mode as well as a fixed-rate one, and they allow for any finite-state scheme of variable-length-to-variable-length coding. For every individual infinite sequence x a quantity rho(x) is defined, called the compressibility of x , which is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for x by any finite-state encoder. This is demonstrated by means of a constructive coding theorem and its converse that, apart from their asymptotic significance, also provide useful performance criteria for finite and practical data-compression tasks. The proposed concept of compressibility is also shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences. While the definition of rho(x) allows a different machine for each different sequence to be compressed, the constructive coding theorem leads to a universal algorithm that is asymptotically optimal for all sequences.
Article
In videoconferencing the interlocutors usually can not experience eye contact. This paper describes an image analysis and synthesis method for constructing images of virtual cameras at intermediate positions between three real cameras. Special emphasis lies on the treatment of occlusions and on the smoothing of temporal disparity fields in image sequences. The method has been tested in simulated videoconferencing sessions where the establishment of true eye contact could be shown. A short description of the image processing approach and of a human factors experiment on the subjective effect of images of virtual cameras is given. INTRODUCTION In conventional videoconferencing systems, the camera is usually placed on top of the monitor (Fig. 1). Thus when a conferee looks at his/her interlocutor who is being displayed on the monitor, there exists an angular deviation between the camera shooting axis and the line of sight of the interlocutor (eye-contact parallax). This deviation impede...
Advanced videocommunications with stereoscopy and individual perspective, in Towards a Pan-European Telecommunication Service Infrastructure - IS&N '94, Kugler et
  • K Hopf
  • D Runde
  • M Böcker
K. Hopf, D. Runde and M. Böcker : Advanced videocommunications with stereoscopy and individual perspective, in Towards a Pan-European Telecommunication Service Infrastructure - IS&N '94, Kugler et. al. (eds.), Berlin, Heidelberg, New York : Springer 1994.
TMS320C4x Module Speci-fication, Version 1.01, 1993
  • Texas Instruments
  • Tim-40. J.-R Ohm
Texas Instruments, TIM-40, TMS320C4x Module Speci-fication, Version 1.01, 1993. J.-R. Ohm et al. / Signal Processing: Image Communication 14 (1998) 147—171
2-channel lenticular system for 3D-imaging with tracked projectors, HHI Annual Report
  • R Bo¨
R. Bo¨, 2-channel lenticular system for 3D-imaging with tracked projectors, HHI Annual Report 1996, HHI, Berlin, 1997.
A multiscale approach to the joint computation of motion and disparity: Application to the synthesis of intermediate views
  • B Chupeau
B. Chupeau, A multiscale approach to the joint computa-tion of motion and disparity: Application to the synthesis of intermediate views, in: Proc. 4th Europ. Workshop on Three-Dimension. Televis., Rome, Italy, October 1993, pp. 223—230.
2-channel lenticular system for 3D-imaging with tracked projectors
  • R Börner
R.Börner : "2-channel lenticular system for 3D-imaging with tracked projectors," HHI Annual Report 1996, Berlin : HHI 1997
A computational approach to establish eye-contact in videocommunication
  • J Liu
  • I P Beldie
  • M Wöpking