Chapter

Environment Estimation for Glossy Reflections in Mixed Reality Applications Using a Neural Network

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Environment textures are used for the illumination of virtual objects within a virtual scene. Using these textures is crucial for high-quality lighting and reflection. In the case of an augmented reality context, the lighting is very important to seamlessly embed a virtual object within the real world scene. To ensure this, the lighting of the environment has to be captured according to the current light information. In this paper, we present a novel approach by stitching the current camera information onto a cube map. This cube map is enhanced in every single frame and is fed into a neural network to estimate missing parts. Finally, the output of the neural network and the currently stitched information is fused to make even mirror-like reflections possible on mobile devices. We provide an image stream stitching approach combined with a neural network to create plausible and high-quality environment textures that may be used for image-based lighting within mixed reality environments.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
How much does a single image reveal about the environment it was taken in? In this paper, we investigate how much of that information can be retrieved from a foreground object, combined with the background (i.e. the visible part of the environment). Assuming it is not perfectly diffuse, the foreground object acts as a complexly shaped and far-from-perfect mirror An additional challenge is that its appearance confounds the light coming from the environment with the unknown materials it is made of. We propose a learning-based approach to predict the environment from multiple reflectance maps that are computed from approximate surface normals. The proposed method allows us to jointly model the statistics of environments and material properties. We train our system from synthesized training data, but demonstrate its applicability to real-world data. Interestingly, our analysis shows that the information obtained from objects made out of multiple materials often is complementary and leads to better performance.
Conference Paper
Full-text available
Physically-based approaches are increasingly used in a wide field of computer graphics. By that, modern graphic engines can provide a realistic output using physical correct values instead of an analytical approximation. Such applications apply the final lighting on a geometry buffer to reduce the complexity. Using this approach for Mediated Reality applications, some changes have to be made in order to fuse the real with the virtual world. In this paper, we present an approach with a focusing on the extraction of real world environment information and saving them directly to the geometry buffer. Therefore, we introduce a solution using spatial geometry to integrate the real world into the virtual environment. Hereby, the approach is usable in real-time and allows for visual interaction between virtual and real world objects. Moreover, a manipulation of the real world is easily possible.
Article
Full-text available
In this work, we propose a method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene. Inferring scene illumination from a single photograph is a challenging problem. The pixel intensities observed in a photograph are a complex function of scene geometry, reflectance properties, and illumination. We introduce an end-to-end solution to this problem and propose a deep neural network that takes the limited field-of-view photo as input and produces an environment map as a panorama and a light mask prediction over the panorama as the output. Our technique does not require special image capture or user input. We preprocess standard low dynamic range panoramas by introducing novel light source detection and warping methods on the panorama, and use the results with corresponding limited field-of-view crops as training data. Our method does not rely on any assumptions on scene appearance, geometry, material properties, or lighting. This allows us to automatically recover high-quality illumination estimates that significantly outperform previous state-of-the-art methods. Consequently, using our illumination estimates for applications like 3D object insertion lead to results that are photo-realistic, which we demonstrate over a large set of examples and via a user study.
Conference Paper
Full-text available
Mobile devices become more and more important today, especially for augmented reality (AR) applications in which the camera of the mobile device acts like a window into the mixed reality world. Up to now, no photorealistic augmentation is possible since the computational power of the mobile devices is still too weak. Even a streaming solution from a stationary PC would cause a latency that affects user interactionsonsiderably. Therefore, we introduce a differential illumination method that allows for a consistent illumination of the inserted virtual objects on mobile devices, avoiding a delay. The necessary computation effort is shared between a stationary PC and the mobile devices to make use of the capacities available on both sides. The method is designed such that only a minimum amount of data has to be transferred asynchronously between the stationary PC and one or multiple mobile devices. This allows for an interactive illumination of virtual objects with a consistent appearance under both temporally and spatially varying real illumination conditions. To describe the complex near-field illumination in an indoor scenario, multiple HDR video cameras are used to capture the illumination from multiple directions. In this way, sources of illumination can be considered that are not directly visible to the mobile device because of occlusions and the limited field of view of built-in cameras.
Article
Full-text available
The paper discusses a panoramic vision system for autonomous-navigation purposes. It describes an economic PC-based method for integrating data from multiple camera sources in real time. The views from adjacent cameras are visualized together as a panorama of the scene using a modified correlation-based stitching algorithm. A separate operator is presented with a particular slice of the panorama matching the user's viewing direction. Additionally, a simulated environment is created where the operator can choose to augment the video by simultaneously viewing an artificial three-dimensional (3-D) view of the scene. Potential applications of this system include enhancing quality and range of visual cues, and navigation under hostile circumstances where direct view of the environment is not possible or desirable.
Article
Full-text available
Accurate registration between real and virtual objects is crucial for augmented reality applications. Existing tracking methods are individually inadequate: magnetic trackers are inaccurate, mechanical trackers are cumbersome, and vision-based trackers are computationally problematic. We present a hybrid tracking method that combines the accuracy of vision-based tracking with the robustness of magnetic tracking without compromising real-time performance or usability. We demonstrate excellent registration in three sample applications.
Conference Paper
We present a method for recovering both incident lighting and surface materials from casually scanned geometry. By casual, we mean a rapid and potentially noisy scanning procedure of unmodified and uninstrumented scenes with a commodity RGB-D sensor. In other words, unlike reconstruction procedures which require careful preparations in a laboratory environment, our method works with input that can be obtained by consumer users. To ensure a robust procedure, we segment the reconstructed geometry into surfaces with homogeneous material properties and compute the radiance transfer on these segments. With this input, we solve the inverse rendering problem of factorization into lighting and material properties using an iterative optimization in spherical harmonics form. This allows us to account for self-shadowing and recover specular properties. The resulting data can be used to generate a wide range of mixed reality applications, including the rendering of synthetic objects with matching lighting into a given scene, but also re-rendering the scene (or a part of it) with new lighting. We show the robustness of our approach with real and synthetic examples under a variety of lighting conditions and compare them with ground truth data.
Conference Paper
This paper presents an effective approach to rendering virtual 3D objects using real-time image based lighting (IBL) with conventional 360\(^{\circ }\) panoramic video. Raw 360\(^{\circ }\) panoramic video captured in a low dynamic range setup is the only light source used for the real-time IBL rendering. Input video data is boosted to high dynamic range using inverse tone mapping. This converted video is then reconstructed into low-resolution diffuse radiance maps to speed up diffuse rendering. A mipmap-based specular sampling scheme provides fast GPU rendering even for glossy specular objects. Since our pipeline does not require any precomputation, it can support a live 360\(^{\circ }\) panoramic video stream as the radiance map, and the process fits easily into a standard rasterization pipeline. The results provide sufficient performance for IBL via stereo head mounted display (HMD), an ideal device for immersive augmented reality films and games using 360\(^{\circ }\) panoramic videos as both the lighting and backdrop for illumination composition.
Article
Image stitching is a technique in which several picturesque images of overlapping domain of view are blended together to result in a panoramic image of high resolution. But most methods of image stitching require almost precise intersects between images and identical illumination to get picture perfect outcomes. Image stitching surveys depict that image stitching is till now a perplexing issue for panoramic images. Image stitching and Video stitching are the current research area in the fields of computer vision, computer graphics and Photographics. [1] This article outlines the aspects of Image and Video stitching techniques, different process stages and approaches adapted, along with different views of map projection.
Article
We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. By analogy with auto-encoders, we propose Context Encoders -- a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s). When training context encoders, we have experimented with both a standard pixel-wise reconstruction loss, as well as a reconstruction plus an adversarial loss. The latter produces much sharper results because it can better handle multiple modes in the output. We found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures. We quantitatively demonstrate the effectiveness of our learned features for CNN pre-training on classification, detection, and segmentation tasks. Furthermore, context encoders can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.
Conference Paper
Consistent illumination of virtual and real objects in augmented reality (AR) is essential to achieve visual coherence. This paper presents a practical method for rendering with consistent illumination in AR in two steps. In the first step, a user scans the surrounding environment by rotational motion of the mobile device and the real illumination is captured. We capture the real light in high dynamic range (HDR) to preserve its high contrast. In the second step, the captured environment map is used to precalculate a set of reflection maps on the mobile GPU which are then used for real-time rendering with consistent illumination. Our method achieves high quality of the reflection maps because the convolution of the environment map by the BRDF is calculated accurately per each pixel of the output map. Moreover, we utilize multiple render targets to calculate reflection maps for multiple materials simultaneously. The presented method for consistent illumination in AR is beneficial for increasing visual coherence between virtual and real objects. Additionally, it is highly practical for mobile AR as it uses only a commodity mobile device.
Article
This paper proposes a method for real-time image de-blurring and panoramic image stitching. Currently there are several methods for de-blurring images and images stitching however, many take several seconds to process on a standard computer. The method proposed in this paper is to implement parallel processing techniques to an existing de-blurring method and a new dynamic-based panoramic image stitching method to achieve real-time image processing. This method has been applied to a muscle inspired camera orientation system that rapidly captures images to process these images in real time. This orientation system captures images while the camera is in motion causing a blurring effect. Using estimated position data from the system dynamics of the camera orientation system the point spread function of the image and the average image location can be estimated; hence, the image is de-blurred and stitched in real time. The proposed method makes use of parallel processing and precomputation techniques to greatly reduce the required processing time when compared to existing methods.
Article
We present a user-friendly image editing system that supports a drag-anddrop object insertion (where the user merely drags objects into the image, and the system automatically places them in 3D and relights them appropriately), postprocess illumination editing, and depth-of-field manipulation. Underlying our system is a fully automatic technique for recovering a comprehensive 3D scene model (geometry, illumination, diffuse albedo, and camera parameters) from a single, low dynamic range photograph. This is made possible by two novel contributions: an illumination inference algorithm that recovers a full lighting model of the scene (including light sources that are not directly visible in the photograph), and a depth estimation algorithm that combines data-driven depth transfer with geometric reasoning about the scene layout. A user study shows that our system produces perceptually convincing results, and achieves the same level of realism as techniques that require significant user interaction.
Conference Paper
We present a method that uses measured scene radiance and global illumination in order to add new objects to light-based models with correct lighting. The method uses a high dynamic range image-based model of the scene, rather than synthetic light sources, to illuminate the new objects. To compute the illumination, the scene is considered as three components: the distant scene, the local scene, and the synthetic objects. The distant scene is assumed to be photometrically unaffected by the objects, obviating the need for reflectance model information. The local scene is endowed with estimated reflectance model information so that it can catch shadows and receive reflected light from the new objects. Renderings are created with a standard global illumination method by simulating the interaction of light amongst the three components. A differential rendering technique allows for good results to be obtained when only an estimate of the local scene reflectance properties is known. We apply the general method to the problem of rendering synthetic objects into real scenes. The light-based model is constructed from an approximate geometric model of the scene and by using a light probe to measure the incident illumination at the location of the synthetic objects. The global illumination solution is then composited into a photograph of the scene using the differential rendering technique. We conclude by discussing the relevance of the technique to recovering surface reflectance properties in uncontrolled lighting situations. Applications of the method include visual effects, interior design, and architectural visualization.
Conference Paper
Mixed reality applications which must provide visual coherence between synthetic and real objects need relighting solutions for both: synthetic objects have to match lighting conditions of their real counterparts, while real surfaces need to account for the change in illumination introduced by the presence of an additional synthetic object. In this paper we present a novel relighting solution called Delta Voxel Cone Tracing to compute both direct shadows and first bounce mutual indirect illumination. We introduce a voxelized, pre-filtered representation of the combined real and synthetic surfaces together with the extracted illumination difference due to the augmentation. In a final gathering step this representation is cone-traced and superimposed onto both types of surfaces, adding additional light from indirect bounces and synthetic shadows from anti-radiance present in the volume. The algorithm computes results at interactive rates, is temporally coherent and to our knowledge provides the first real-time rasterizer solution for mutual diffuse, glossy and perfect specular indirect reflections between synthetic and real surfaces in mixed reality.
Article
This paper presents a webcam-based spherical coordinate conversion system using OpenCL massive par-allel computing for panorama video image stitching. With multi-core architecture and its high-bandwidth data trans-mission rate of memory accesses, modern programmable GPU makes it possible to process multiple video images in parallel for real-time interaction. To get a panorama view of 360 degrees, we use OpenCL to stitch multiple webcam video images into a panorama image and texture mapped it to a spherical object to compose a virtual reality immersive environment. The experimental results show that when we use NVIDIA 9600GT to process eight 640×480 images, OpenCL can achieve ninety times speedups.
Conference Paper
This paper presents a panorama stitching system of using only one single camera to generate an omnidirectional scene map for visual localization tasks. Instead of assuming that the overlapping area keeps constant for two adjacent images, the overlapping area is estimated and further refined by using matched SURF feature pairs. By doing this, the dependency of the system on camera setup and its motion could be reduced, and hence it will be a more robust system. To consider an environment with highly symmetric or repeated features, the regional SURF feature detection is applied to lower down the fault error during the SURF feature matching. To reduce the distortion within the resulting panorama image, the overlapping area weighted image plane projection is used to successfully create a panorama image. Furthermore, to prevent ghost effect during stitching panorama image, the dynamic programming algorithm is used to find an optimal path to stitch two adjacent images. The proposed algorithm has been tested in two indoor environments and related qualitative and quantitative results are analyzed in detail.
We introduce the problem of scene viewpoint recognition, the goal of which is to classify the type of place shown in a photo, and also recognize the observer's viewpoint within that category of place. We construct a database of 360° panoramic images organized into 26 place categories. For each category, our algorithm automatically aligns the panoramas to build a full-view representation of the surrounding place. We also study the symmetry properties and canonical viewpoint of each place category. At test time, given a photo of a scene, the model can recognize the place category, produce a compass-like indication of the observer's most likely viewpoint within that place, and use this information to extrapolate beyond the available view, filling in the probable visual layout that would appear beyond the boundary of the photo.
Article
Cube Mapping is a popular and promising environment mapping technology in which six cubic images containing the information of entire environment are used to enhance the reality of the virtual object in the Augmented Reality system. In this paper, a new method is proposed, which generate the six cubic images in real time from a pre-process unplanned image sequence gotten by a calibrated camera. As a part of system initialization, a series of unplanned image sequences were recorded, as well as the camera extrinsic-parameter of each image. While every virtual object is rendered, several images are chosen according to the camera's position and attitude information to build an environment sphere model whose center is same as the virtual object's. Each of the select images is filled to corresponding area of the sphere model's surface due to its orientation and the upper direction of the camera. After the surface of the sphere is full of filling, the model is constructed to reflect the all surrounding scene like a mirrored sphere. According to the equirectangular projection algorithm, the surface of the sphere model with longitude and latitude coordinate system can be unfold to rectangular image which is called equirectangular panoramic image. Then the six cubic images can be segmented from the equirectangular panoramic image by a special algorithm.
Article
Image-based lighting (IBL) is typically used for distant lighting represented by an infinite environment map. This technique has been used by many games. Games divide their scenes into several sectors and associate a cubemap (the environment mapping of choice due to its graphic hardware performance) to each of them. The cubemap of the sector where the camera is located is then used to light objects [1]. The problem with this approach is that it cannot accurately represent local reflections on specular and glossy objects.. Our game requires more accurate local reflections, which implies that a local image-based lighting technique must be used. Previous local image-based lighting approaches, such as the Half life 2 approach [2] consist of assigning an individual cubemap to each objects. These approaches suffer from lighting seams and parallax issues [3]. We introduce a new approach which avoids these artifacts while still preserving extremely high performance on current console generation hardware (PS3/XBOX360).
Article
For a photorealistic appearance of virtual objects rendered in augmented reality environments global illumination must be simulated. In this paper we present a real-time technique for generating reflections on virtual objects which are rendered into images or image sequences of real scenes. The basic idea of virtual reflections is to estimate the environment of a virtual object by extracting information from the image to be augmented. Based on this estimated environment, cube map textures are generated and applied by using environment mapping during rendering to simulate reflections on the virtual object. Although some work has been done to simulate global lighting conditions in real-time augmented reality systems, current approaches do not consider reflections of real world objects onto virtual objects. Virtual reflections are an easy to implement approximation of such reflections and can be combined with existing illumination techniques, such as the creation of shadows for augmented reality environments.
Article
Abstract We propose an analysis of numerical integration based on sampling theory, whereby the integration error caused by aliasing is suppressed by pre-filtering. We derive a pre-filter for evaluating the illumination integral yielding filtered importance sampling, a simple GPU-based rendering algorithm for image-based lighting. Furthermore, we extend the algorithm with real-time visibility computation. Free from any pre-computation, the algorithm supports fully dynamic scenes and, above all, is simple to implement.
Article
This tutorial reviews image alignment and image stitching algorithms. Image alignment (registration) algorithms can discover the large-scale (parametric) correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of large-scale panoramic photographs. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to produce seamless mosaics. It closes with a discussion of open research problems in the area.
Article
We present a method that uses measured scene radiance and global illumination in order to add new objects to light-based models with correct lighting. The method uses a high dynamic range imagebased model of the scene, rather than synthetic light sources, to illuminate the newobjects. To compute the illumination, the scene is considered as three components: the distant scene, the local scene, and the synthetic objects. The distant scene is assumed to be photometrically unaffected by the objects, obviating the need for re- flectance model information. The local scene is endowed with estimated reflectance model information so that it can catch shadows and receive reflected light from the new objects. Renderings are created with a standard global illumination method by simulating the interaction of light amongst the three components. A differential rendering technique allows for good results to be obtained when only an estimate of the local scene reflectance properties is known. We apply the general method to the problem of rendering synthetic objects into real scenes. The light-based model is constructed from an approximate geometric model of the scene and by using a light probe to measure the incident illumination at the location of the synthetic objects. The global illumination solution is then composited into a photograph of the scene using the differential rendering technique. We conclude by discussing the relevance of the technique to recovering surface reflectance properties in uncontrolled lighting situations. Applications of the method include visual effects, interior design, and architectural visualization.
Conference Paper
Many methods are available for image mosaicing most of which are not useful because they either (1) require a great deal of overlap between images, or (2) if they work for restricted sub-problem (translation, rotation or zooming) they would not work for the others. While numerous methods exist for accurately calculating translational shifts, no method was found that could handle rotations of angles greater than 15 degrees, or for scaling. In this paper, a new method using Zernike moments is presented to solve for the translational registration, rotational registration (2D and 3D) and zooming all at the same time. This method was tested on different sets of images with translational shifts, rotational shifts and zooming. The method is very fast, efficient and does not require any human interaction, and the results proved to be very accurate
Conference Paper
Cameras with bellows give photographers flexibility for controlling perspective, but once the picture is taken, its perspective is set. We introduce `virtual bellows' to provide control over perspective after a picture has been taken. Virtual bellows can be used to align images taken from different viewpoints, an important initial step in applications such as creating a high-resolution still image from video. We show how the virtual bellows, which implements the projective group, is an exact model fit to both pan and tilt. Specifically, we identify two important classes of image sequences accommodated by the virtual bellows. Examples of constructing high-quality stills are shown for the two cases: multiple frames taken of a flat object, and multiple frames taken from a fixed point
Image stitching using Harris feature detection
  • S Mistry
  • A Patel
A technical analysis of image stitching algorithm
  • P Kale
  • K R Singh
A Technical Analysis of
  • P Kale
  • K R Singh
Kale, P., Singh, K.R.: A Technical Analysis of Image Stitching Algorithm 6(1), 284-288 (2015)
Image Stitching using Harris Feature Detection Shreyas
  • S Mistry
  • A Patel
Mistry, S., Patel, A.: Image Stitching using Harris Feature Detection Shreyas. International Research Journal of Engineering and Technology (IRJET) 03(04) (2016)
Glossy Reflections for Mixed Reality Environments on Mobile Devices
  • T Schwandt
  • C Kunert
  • W Broll
Schwandt, T., Kunert, C., Broll, W.: Glossy Reflections for Mixed Reality Environments on Mobile Devices. In: Cyberworlds 2018. Institute of Electrical and Electronics Engineers (IEEE) (2018). https://doi.org/10.1007/978-3-319-60928-7 30, http://dx.doi.org/ 10.1007/978-3-319-60928-7\_30