December 2024
·
2 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2024
·
2 Reads
September 2024
·
18 Reads
Radiance fields are powerful and, hence, popular models for representing the appearance of complex scenes. Yet, constructing them based on image observations gives rise to ambiguities and uncertainties. We propose a versatile approach for learning Gaussian radiance fields with explicit and fine-grained uncertainty estimates that impose only little additional cost compared to uncertainty-agnostic training. Our key observation is that uncertainties can be modeled as a low-dimensional manifold in the space of radiance field parameters that is highly amenable to Monte Carlo sampling. Importantly, our uncertainties are differentiable and, thus, allow for gradient-based optimization of subsequent captures that optimally reduce ambiguities. We demonstrate state-of-the-art performance on next-best-view planning tasks, including high-dimensional illumination planning for optimal radiance field relighting quality.
January 2024
·
6 Reads
·
2 Citations
January 2024
·
7 Reads
·
4 Citations
December 2023
·
15 Reads
·
17 Citations
ACM Transactions on Graphics
Inverse rendering, the process of inferring scene properties from images, is a challenging inverse problem. The task is ill-posed, as many different scene configurations can give rise to the same image. Most existing solutions incorporate priors into the inverse-rendering pipeline to encourage plausible solutions, but they do not consider the inherent ambiguities and the multi-modal distribution of possible decompositions. In this work, we propose a novel scheme that integrates a denoising diffusion probabilistic model pre-trained on natural illumination maps into an optimization framework involving a differentiable path tracer. The proposed method allows sampling from combinations of illumination and spatially-varying surface materials that are, both, natural and explain the image observations. We further conduct an extensive comparative study of different priors on illumination used in previous work on inverse rendering. Our method excels in recovering materials and producing highly realistic and diverse environment map samples that faithfully explain the illumination of the input images.
December 2023
·
21 Reads
·
20 Citations
ACM Transactions on Graphics
Capturing and editing full-head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs, and others. While these data modalities provide effective means of control, they mostly focus on editing the head movements such as the facial expressions, head pose, and/or camera viewpoint. In this paper, we propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using Neural Radiance Field (NeRF) and edits this representation with a text-to-image diffusion model. Specifically, we introduce an optimization strategy for incorporating multiple keyframes representing different camera viewpoints and time stamps of a video performance into a single diffusion model. Using this personalized diffusion model, we edit the dynamic NeRF by introducing view-and-time-aware Score Distillation Sampling (VT-SDS) following a model-based guidance approach. Our method edits the full head in a canonical space and then propagates these edits to the remaining time steps via a pre-trained deformation network. We evaluate our method visually and numerically via a user study, and results show that our method outperforms existing approaches. Our experiments validate the design choices of our method and highlight that our edits are genuine, personalized, as well as 3D- and time-consistent.
October 2023
·
89 Reads
·
3 Citations
International Journal of Computer Vision
Portrait viewpoint and illumination editing is an important problem with several applications in VR/AR, movies, and photography. Comprehensive knowledge of geometry and illumination is critical for obtaining photorealistic results. Current methods are unable to explicitly model in 3D while handling both viewpoint and illumination editing from a single image. In this paper, we propose VoRF, a novel approach that can take even a single portrait image as input and relight human heads under novel illuminations that can be viewed from arbitrary viewpoints. VoRF represents a human head as a continuous volumetric field and learns a prior model of human heads using a coordinate-based MLP with individual latent spaces for identity and illumination. The prior model is learned in an auto-decoder manner over a diverse class of head shapes and appearances, allowing VoRF to generalize to novel test identities from a single input image. Additionally, VoRF has a reflectance MLP that uses the intermediate features of the prior model for rendering One-Light-at-A-Time (OLAT) images under novel views. We synthesize novel illuminations by combining these OLAT images with target environment maps. Qualitative and quantitative evaluations demonstrate the effectiveness of VoRF for relighting and novel view synthesis, even when applied to unseen subjects under uncontrolled illumination. This work is an extension of Rao et al. (VoRF: Volumetric Relightable Faces 2022). We provide extensive evaluation and ablative studies of our model and also provide an application, where any face can be relighted using textual input.
August 2023
·
70 Reads
Humans effortlessly infer the 3D shape of objects. What computations underlie this ability? Although various computational models have been proposed, none of them capture the human ability to match object shape across viewpoints. Here, we ask whether and how this gap might be closed. We begin with a relatively novel class of computational models, 3D neural fields, which encapsulate the basic principles of classic analysis-by-synthesis in a deep neural network (DNN). First, we find that a 3D Light Field Network (3D-LFN) supports 3D matching judgments well aligned to humans for within-category comparisons, adversarially-defined comparisons that accentuate the 3D failure cases of standard DNN models, and adversarially-defined comparisons for algorithmically generated shapes with no category structure. We then investigate the source of the 3D-LFN's ability to achieve human-aligned performance through a series of computational experiments. Exposure to multiple viewpoints of objects during training and a multi-view learning objective are the primary factors behind model-human alignment; even conventional DNN architectures come much closer to human behavior when trained with multi-view objectives. Finally, we find that while the models trained with multi-view learning objectives are able to partially generalize to new object categories, they fall short of human alignment. This work provides a foundation for understanding human shape inferences within neurally mappable computational architectures and highlights important questions for future work.
August 2023
·
2 Reads
Journal of Vision
July 2023
·
56 Reads
·
184 Citations
... Implicit Shape Representations. Implicit shape representations are state-of-the-art in encoding shape geometric details [28,35,33,3,31,44,41]. To improve the shape modeling capability, researchers inject local-aware designs. ...
January 2024
... Facial makeup is an important aspect of human appearance. In computer vision and graphics, mainstream research focuses on makeup transfer [7][8][9]13,17,18,22,25,26,28,32,42,43,[50][51][52]62], 3D makeup [16,24,30,39,[56][57][58], and face verification [15,40]. ...
December 2023
ACM Transactions on Graphics
... Light estimation is a distinct research field [13,16,42,57,58,60,62,72]. Some methods represent lighting implicitly [24,69,78], limiting generalizability, while others require multi-view inputs and scene mesh data [42,60,72]. DPI [35] is related to our approach. This method combines differential rendering with Stable Diffusion for high-quality envmap generation but relies on multi-view NeRF methods for mesh reconstruction and lacks material BRDF data, reducing accuracy. ...
December 2023
ACM Transactions on Graphics
... Feed-forward approaches for 3D reconstruction and rendering aim to generalize across scenes by learning from large datasets. Early works on generalizable NeRFs focus on object-level (Chibane et al., 2021;Johari et al., 2022;Reizenstein et al., 2021;Yu et al., 2021) and scene-level reconstruction (Suhail et al., 2022;Wang et al., 2021;Du et al., 2023;. These methods typically rely on epipolar sampling or cost volumes to fuse multi-view features, requiring extensive point sampling for rendering, which results in slow speed and often unsatisfactory details. ...
June 2023
... While text-guided image editing demonstrates promising potential [5,7,17,22,32,35], it is constrained by the ambiguity of language instructions and the lack of precise spatial control, e.g., failing to accurately adjust the shape, position, or posture of a human. In contrast, interactive image editing [1,11,25,41,51] offers a more flexible and precise solution, which supports more intuitive operations like drawing sketches, clicking points, and dragging regions. ...
July 2023
... However, these methods do not natively allow for relighting, which is required for accurately compositing the face into different backgrounds or environments. Some extensions [Deng et al. 2023;Jiang et al. 2023;Pan et al. 2022Pan et al. , 2021Ranjan et al. 2023] aim to disentangle the geometry and reflectance of the face from the environmental lighting by implicitly learning a subspace of intrinsic components like albedo, specularity, and normals, but, do not model the light transport accurately enough with ground truth disentangled data. In-the-wild images have a low dynamic range, non-linear photometric effects due to saturation and colored lighting, and different camera response curves. ...
September 2022
... Compared with neural SDF, these methods result in discontinuous artifacts during the reconstruction. NeRFactor [53], NeRD [3], Neural-PIL [4], NeRV [40], Neural Transfer Field [27], InvRender [54] and TensoIR [15] use a density field as the geometry representation and an environment tensor with Monte Carlo sampling for the light reconstruction. To solve the ambiguity of the base color and environment light, [8,19] show the importance of adding a material prior to inverse rendering. ...
October 2022
Lecture Notes in Computer Science
... Equipped with the neural scene representation and a differentiable rendering algorithm, 3D-aware GANs can produce multi-view consistent images [10,56,67]. Several approaches [9, 26, 51, 75, 80] adopt a two-stage rendering process, which leverages convolution neural networks (CNNs) to increase the resolution of the image or neural rendering features, to generate 3D-aware images at higher resolution efficiently. ...
June 2022
... Given the bounding box of the entire scene, our approach first partitions the scene into several sub-regions, iterates through each sub-region, and uses location-specific sampling within the sub-region. To achieve this, we follow the concept of the Neural Ground Plan (NGP) [48] that assumes the scene can be represented as a flat surface and partition it into a twodimensional grid from a top-down perspective. Instead of optimizing the whole scene, ours only optimizes a certain subregion at each time with the location-specific sampling technique, which largely reduces the sampling range and thus improves the training efficiency. ...
July 2022
... Implicit neural representation models 3D scenes as differentiable continuous neural networks (Tewari et al., 2022). NeRF (Mildenhall et al., 2020) learns density and radiance field values of the scene supervised by 2D images. ...
May 2022