Article

Advanced Volumetric Capture and Processing

Authors:
  • Fraunhofer Heinrich Hertz Institute HHI
  • Volucap GmbH
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Volumetric video is regarded worldwide as the next important development in media production, especially in the context of rapidly evolving virtual and augmented reality markets where volumetric video is becoming a key technology. Fraunhofer Heinrich Hertz Institute (HHI) has developed a novel technology for volumetric video. The 3D Human Body Reconstruction (3DHBR) technology captures real persons with our novel volumetric capture system and creates naturally moving dynamic 3D models, which can then be observed from arbitrary viewpoints in a virtual or augmented reality scene. The capture system consists of an integrated multicamera and lighting system for a full 360° acquisition. A cylindrical studio has been developed with a diameter of 6 m and consists of 32 20-MPixel cameras and 120 light-emitting-diode (LED) panels that allow for an arbitrary lit background. Hence, diffuse lighting and automatic keying are supported. The avoidance of green screen and provision of diffuse lighting offers the best possible conditions for relighting of the dynamic 3D models afterward at the design stage of the virtual reality (VR) experience. In contrast to classical character animation, facial expressions and moving clothes are reconstructed at high geometrical detail and texture quality. The complete workflow is fully automatic, requires about 12 hr/min of mesh sequence, and provides a high level of quality for immediate integration in virtual scenes. Meanwhile, a second, professional studio has been built up on the film campus of Potsdam Babelsberg. This studio is operated by VoluCap GmbH, a joint venture between Studio Babelsberg, ARRI, UFA, Interlake, and Fraunhofer Heinrich Hertz Institute (HHI).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Still, there is a problem: a lot of human resources and time are required to make 3D models fundamentally. To overcome this, various technologies for generating 3D models based on 2D images have emerged, and 4D volumetric capture is attracting attention as the latest model of the technology [15][16][17][18][19]. Guo et al. developed "The Relightables" a volumetric capture system for photorealistic and high-quality relightable full-body performance capture. ...
... They also proposed a complete multi-view 3D processing chain for high-quality sequences of meshes in terms of geometrical detail and texture quality. Chen et al. enhanced a professional end-to-end volumetric video production pipeline to achieve high-fidelity human body reconstruction using only a passive camera [17]. DynamicFusion [19] is a technology for the real-time reconstruction of a 3D model using a depth image captured by a single depth sensor, and the depth information acquired by a single RGB-Depth camera is gradually accumulated. ...
Article
Full-text available
A sequence of 3D models generated using volumetric capture has the advantage of retaining the characteristics of dynamic objects and scenes. However, in volumetric data, since 3D mesh and texture are synthesized for every frame, the mesh of every frame has a different shape, and the brightness and color quality of the texture is various. This paper proposes an algorithm to consistently create a mesh of 4D volumetric data using dynamic reconstruction. The proposed algorithm comprises remeshing, correspondence searching, and target frame reconstruction by key frame deformation. We make non-rigid deformation possible by applying the surface deformation method of the key frame. Finally, we propose a method of compressing the target frame using the target frame reconstructed using the key frame with error rates of up to 98.88% and at least 20.39% compared to previous studies. The experimental results show the proposed method’s effectiveness by measuring the geometric error between the deformed key frame and the target frame. Further, by calculating the residual between two frames, the ratio of data transmitted is measured to show a compression performance of 18.48%.
... Volumetric capture [57,123] and 3D reconstruction methods reconstruct the shape and the surface of an object or of a whole scene. These methods provide their best results if the captured images show the object from all sides. ...
... E.g. if two actors stand in front of each other, the closer one covers the far one. In essence, this differs from the methods proposed by Collet et al. [57], Dou et al. [58] and Schreer et al. [123] where the camera setup completely surrounds the scene. ...
Thesis
Full-text available
Starting by the year 2013, head-mounted displays in combination with virtual reality or augmented reality have become a mass phenomenon in the area of computer games . In contrast to classical displays, head-mounted displays enable a user to move around in a virtual world and to review the embedded objects from different perspectives. Therefore, the scene and all objects need to be represented as a 3D model or in a comparable representation (e.g. as a light-field). Creating and replaying such content is especially challenging for dynamic live-action scenes, possibly with human actors. Firstly, many cameras need to capture the scene simultaneously from many perspectives. Afterwards, specialized software processes the footage before a user can review the scene. Integrating a live-action scene captured with a multi-camera array into a virtual environment forms the central application of this work. To reach this goal, this thesis reviews the whole processing pipeline including camera setup, camera calibration, depth-reconstruction and novel view synthesis and optimizes the processing pipeline at critical points. A special focus is on the necessary software. Here, this thesis contributes a series of novel software plugins that enable a user in a creative environment (e.g. in the area of media production) to perform such a task. Such a software needs to be robust, easy to use and needs to integrate smoothly into existing workflows. In the area of multi-camera calibration (chapter 4) this thesis contributes a novel, specialized and reliable calibration technique. The proposed self-calibration procedure does not rely on special calibration objects. Instead, the method exploits the high mechanical stability and manufacturing precision of the underlying camera array. In contrast to classical self-calibration techniques, the proposed method uses a new optimization concept that significantly reduces the number of unknown parameters. This concept is also beneficial for existing calibration procedures. Multi-camera disparity estimation and filtering forms the second major chapter and adds two contributions. Firstly, a new algorithm for near-real time depth reconstruction from planar camera arrays. With this algorithm (and the view-synthesis method from chapter 6), users e.g. on a film set can directly review the captured footage with proper motion parallax. This way, a creative director can detect potential problems with the artistic performance at an early stage. Secondly, this chapter presents a new concept for off-line multi-camera depth reconstruction. Instead of a monolithic algorithm, the concept builds up on modular elements where each module implements a specific function such as stereo-matching or disparity filtering. A user in a post-production environment can then setup and parametrize a scene-specific depth-reconstruction pipeline. The plugin suite Realception for Nuke is therefore a direct result of this thesis. Comparisons show that an adapted pipeline based on block-matching as initial depth-reconstruction technique competes with or even outperforms highly complex, monolithic depth-reconstruction methods. Chapter 6 finally introduces the real-time rendering method which is based on depth-image based rendering. Here, the combination of light-field rendering and classical computer graphics pipelines forms the major contribution. In the absence of an explicit 3D model, this chapter proposes an intermediate screen element that transfers pixels coming from a light-field representation into a 3D environment that is essentially based on explicit geometric models (e.g. meshes). Experiments show that this approach yields high image quality. A realistic test shot and a demo setup finally demonstrate the functionality and quality of the whole capture and processing pipeline.
... com) and Unreal (https://www.unrealengine.com), specifically, in the field of performance capture and animated produced films (Schreer et al. 2019;Tricart 2018). This has not only changed the film industry but the game engines themselves with new pipelines for film, animation and cinematics in the latest engine releases. ...
Article
This paper proposes the role of an emerging job role within the virtual reality (VR) film production crew. This role we call the Embodiment Director, involves assisting the VR Film Director in the accomplishment of true immersive film experiences through the use of game engines and VR peripherals. The Embodiment Director will manage the inception of haptics and stimuli technologies that allow for the embodiment of humans within a virtual environment, and they must guarantee precise synchronicity between physical and virtual counterparts, while overseeing the safety use of software and hardware during the entire production process of the VR film experiences. This paper offers a contemporary review of the key creative roles within traditional and virtual film production, in order to generate a concise and valid argument for the role of the Embodiment Director supported by autoethnography.
... Using a large number of sample resources to train the network will affect the application of image restoration, and combining traditional restoration methods with deep features or using small samples to train to learn to improve the quality of image restoration will be the direction of further 2 Complexity research. It is believed that better restoration methods will be proposed; the research on virtual reality interaction and virtual reality interaction design, from the available information, seems to focus more on the research on hardware, software technology, psychology, and human-computer interaction, which is related to the path and process of virtual reality development [22]. e interaction and interaction design in immersive virtual reality is a comprehensive cross-presentation of a variety of expertise in technology, art, and science, which are closely interlinked and gradually developed based on technology. ...
Article
Full-text available
In this paper, exploratory and innovative research is done on the implementation technique of artistic color virtual reality for similarity image recovery. Based on similarity images, a nonlocal natural image before the regular term is proposed to deal with the single-image blind deblurring problem. This paper designs a new artistic color virtual reality realization technology based on similarity image restoration, which exploits the low-rank property between nonlocal similarity blocks in images and combines a strong convex term to enhance the convexity of the artistic color virtual reality model. We analyze virtual reality interaction design from the perspective of art color design, sort out the concept and content of design, analyze the elements, design principles, and evaluation criteria included in virtual reality interaction art color design, and explore the conceptual principles of virtual reality interaction art color design. A full understanding of the characteristics of the medium of the virtual reality interaction can help us to better use this medium as a tool to create works that aim to bring higher quality and experiential feeling with a perceptual communication method that is closer to natural interaction. Combining the power of technology, artistic colourful thinking, and a design approach paves the way forward. The study shows that virtual reality technology can effectively improve the status quo and promote the cultivation of professional practice ability in art color design, which is conducive to the cultivation of applied design talents.
... A similar approach is followed with Volucap [5], [6] but for the production of high quality 3D movie assets and entire scenes. Here, a diffuse back-lit dome is equipped with 16 stereo camera pairs, each pair connected with a dedicated high-end PC. ...
Article
Full-text available
Volumetric videos allow for a true three-dimensional experience where users can freely choose their viewing angles and be actually immersed in a video clip. High quality video productions are gaining attention and first volumetric video recordings are commercially provided at select places. Unfortunately, the production process is very time, labour, and technology-resource intensive, which requires specialist hardware, software, and production experts. A "youtube-like" production, distribution, and experience system would be desirable. Here we present an approach which allows for the creation and interactive replay of three-dimensional videoclips using a novel voxel-based platform—voxelvideos. We can show that our voxelvideos experienced in virtual and augmented reality are effective, enjoyable, and perceived as useful. We hope that our approach and findings will encourage researchers, media experts, and hobbyists to experiment with voxelvideos as a new form of affordable media production and experience.
... Capturing and displaying volumetric videos is becoming feasible [2,30]. Point clouds are frequently used as a data format for volumetric video in augmented reality (AR) and virtual reality (VR) applications. ...
... Capturing and displaying volumetric videos is becoming feasible [2,30]. Point clouds are frequently used as a data format for volumetric video in augmented reality (AR) and virtual reality (VR) applications. ...
Article
Volumetric video recordings of storytellers, when experienced in immersive virtual reality, can elicit a sense of copresence between the user and the storyteller. Combining a volumetric storyteller with an appropriate virtual environment presents a compelling experience that can convey the story with a depth that is hard to achieve with traditional forms of media. Volumetric video production remains difficult, time-consuming, and expensive, often excluding cultural groups who would benefit most. The difficulty is partly due to ever-increasing levels of visual detail in computer graphics, and resulting hardware and software requirements. A high level of detail is not a requirement for convincing immersive experiences, and by reducing the level of detail, experiences can be produced and delivered using readily available, nonspecialized equipment. By reducing computational requirements in this way, storytelling scenes can be created ad hoc and experienced immediately—this is what we are addressing with our approach. We present our portable real-time volumetric capture system, and our framework for using it to produce immersive storytelling experiences. The real-time capability of the system, and the low data rates resulting from lower levels of visual detail, allow us to stream volumetric video in real time to enrich experiences with embodiment (seeing oneself) and with copresence (seeing others). Our system has supported collaborative research with Māori partners with the aim of reconnecting the dispersed Māori population in Aotearoa, New Zealand to their ancestral land through immersive storytelling. We present our system in the context of this collaborative work.
Thesis
Presenting Holographic Studio, a volumetric video capturing and production system for performance capture, visible from any perspective in virtual environments. This technology not only finds a natural application under the entertainment industry such as video-games and remote concerts, but also extends itself to other industries such as sports, fashion, culture, remote communication and education, military and medical training, among others. This dissertation aims at reducing the gap between small and medium-sized companies, that would benefit with the increase in immersion of their content, and this technology, typically characterized by expensive hardware, scarce availability and high usage costs. The proposed infrastructure resorts to RGB-D sensors to capture depth data and integrates 3D data processing algorithms to clean and replicate live action performances to be viewed in immersive virtual spaces. The implemented architecture provides a solid basis to future innovations on the photorealism of captures, in a scalable and replicable way.
Article
Full-text available
This paper presents a novel approach to recover true fine surface detail of deforming meshes reconstructed from multi-view video. Template-based methods for performance capture usually produce a coarse-to-medium scale detail 4D surface reconstruction which does not contain the real high-frequency geometric detail present in the original video footage. Fine scale deformation is often incorporated in a second pass by using stereo constraints, features, or shading-based refinement. In this paper, we propose an alternative solution to this second stage by formulating dense dynamic surface reconstruction as a global optimization problem of the densely deforming surface. Our main contribution is an implicit representation of a deformable mesh that uses a set of Gaussian functions on the surface to represent the initial coarse mesh, and a set of Gaussians for the images to represent the original captured multi-view images. We effectively find the fine scale deformations for all mesh vertices, which maximize photo-temporal-consistency, by densely optimizing our model-to-image consistency energy on all vertex positions. Our formulation yields a smooth closed form energy with implicit occlusion handling and analytic derivatives. Furthermore, it does not require error-prone correspondence finding or discrete sampling of surface displacement values. We demonstrate our approach on a variety of datasets of human subjects wearing loose clothing and performing different motions. We qualitatively and quantitatively demonstrate that our technique successfully reproduces finer detail than the input baseline geometry.
Article
Full-text available
We describe a system for high-resolution capture of moving 3D geometry, beginning with dynamic normal maps from multiple views. The normal maps are captured using active shape-from-shading (photometric stereo), with a large lighting dome providing a series of novel spherical lighting configurations. To compensate for low-frequency deformation, we perform multi-view matching and thin-plate spline deformation on the initial surfaces obtained by integrating the normal maps. Next, the corrected meshes are merged into a single mesh using a volumetric method. The final output is a set of meshes, which were impossible to produce with previous methods. The meshes exhibit details on the order of a few millimeters, and represent the performance over human-size working volumes at a temporal resolution of 60Hz.
Article
Full-text available
Many applications in computer graphics require complex, highly detailed models. However, the level of detail actually necessary may vary considerably. To control processing time, it is often desirable to use approximations in place of excessively detailed models. We have developed a surface simplification algorithm which can rapidly produce high quality approximations of polygonal models. The algorithm uses iterative contractions of vertex pairs to simplify models and maintains surface error approximations using quadric matrices. By contracting arbitrary vertex pairs (not just edges), our algorithm is able to join unconnected regions of models. This can facilitate much better approximations, both visually and with respect to geometric error. In order to allow topological joining, our system also supports non-manifold surface models. CR Categories: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling---surface and object representations Keywords: surface simplificatio...
Article
We present a flexible algorithmic pipeline for high-quality three-dimensional acquisition of dynamic real world objects. In this context we discuss the reduction of mesh complexity as one of the key challenges for visualizing reconstructed three-dimensional content in augmented and virtual reality applications.
Article
The target application of this paper is 3D scene reconstruction for future real-time production scenarios in the broadcast domain as well as future post-production and on-set visual effect previews in the digital cinema area. Our approach is based on multiple trifocal camera capture systems which can be arbitrarily distributed on set. In this work we tackle the problem of multi-view data fusion from a real-time perspective. The novelty of our work is that instead of performing a pixel wise processing we consider patch groups as higher level scene representations. Based on the robust results of the trifocal sub-systems we implicitly obtain an optimized set of patch groups even for partly occluded regions by the application of a simple geometric rule set. Further on, we show that a simplified meshing can be applied to the patch group borders which enables a GPU centric real-time implementation. The presented algorithm is tested on real world test shoot data for the case of 3D reconstruction of humans.
Article
In future 3D videoconferencing systems, depth estimation is required to support autostereoscopic displays and even more important, to provide eye contact. Real-time D video processing is currently possible, but within some limits. Since traditional CPU centred sub-pixel disparity estimation is computationally expensive, the depth resolution of fast stereo approaches is directly linked to pixel quantization and the selected stereo baseline. In this work we present a novel, highly parallelizable algorithm that is capable of dealing with arbitrary depth resolutions while avoiding texture interpolation related runtime penalties by application of GPU centred design. The cornerstone of our patch sweeping approach is the fusion of space sweeping and patch based 3D estimation techniques. Especially for narrow baseline multi-camera configurations, as commonly used for 3D videoconferencing systems (e.g. [1]), it preserves the strengths of both techniques and avoid their shortcomings at the same time. Moreover, we provide a sophisticated parameterization and quantization scheme that establishes a very good scalability of our algorithm in terms of computation time and depth estimation quality. Furthermore, we present an optimized CUDA implementation for a multi GPU setup in a cluster environment. For each GPU, it performs three pair wise high quality depth estimations for a trifocal narrow baseline camera configuration on a 256x256 image block within real-time.
Conference Paper
We show that surface reconstruction from oriented points can be cast as a spatial Poisson problem. This Poisson formulation considers all the points at once, without resorting to heuristic spatial partitioning or blending, and is therefore highly resilient to data noise. Unlike radial basis function schemes, our Poisson approach allows a hierarchy of locally supported basis functions, and therefore the solution reduces to a well conditioned sparse linear system. We describe a spatially adaptive multiscale algorithm whose time and space complexities are pro- portional to the size of the reconstructed model. Experimenting with publicly available scan data, we demonstrate reconstruction of surfaces with greater detail than previously achievable.