Preprint

V4D: Voxel for 4D Novel View Synthesis

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this paper, we present the method to model the 4D neural radiance field by the 3D voxel, short as V4D, where the 3D voxel has two formats. The first one is to regularly model the bounded 3D space and then use the sampled local 3D feature with the time index to model the density field and the texture field. The second one is in look-up tables (LUTs) format that is for the pixel-level refinement, where the pseudo-surface produced by the volume rendering is utilized as the guidance information to learn a 2D pixel-level refinement mapping. The proposed LUTs-based refinement module achieves the performance gain with a little computational cost and could serve as the plug-and-play module in the novel view synthesis task. Moreover, we propose a more effective conditional positional encoding toward the 4D data that achieves performance gain with negligible computational burdens. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance by a large margin. At last, the proposed V4D is also a computational-friendly method in both the training and testing phase, where we achieve 2 times faster in the training phase and 10 times faster in the inference phase compared with the state-of-the-art method.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Image quality measures are becoming increasingly important in the field of computer graphics. For example, there is currently a major focus on generating photorealistic images in real time by combining path tracing with denoising, for which such quality assessment is integral. We present FLIP, which is a difference evaluator with a particular focus on the differences between rendered images and corresponding ground truths. Our algorithm produces a map that approximates the difference perceived by humans when alternating between two images. FLIP is a combination of modified existing building blocks, and the net result is surprisingly powerful. We have compared our work against a wide range of existing image difference algorithms and we have visually inspected over a thousand image pairs that were either retrieved from image databases or generated in-house. We also present results of a user study which indicate that our method performs substantially better, on average, than the other algorithms. To facilitate the use of FLIP, we provide source code in C++, MATLAB, NumPy/SciPy, and PyTorch.
Article
Recent years have witnessed the increasing popularity of learning-based photo enhancement methods. However, existing methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images in practice. In this paper, we learn image-adaptive 3-dimensional lookup tables (3D LUTs) to achieve fast and robust photo enhancement. 3D LUTs are widely used for manipulating color and tone of photos, but they are usually manually tuned and fixed in camera imaging pipeline or photo editing tools. We, for the first time to our best knowledge, propose to learn 3D LUTs from annotated data. More importantly, our learned 3D LUT is image-adaptive. We learn multiple basis 3D LUTs and a small convolutional neural network (CNN) simultaneously in an end-to-end manner. The small CNN predicts content-dependent weights to fuse the multiple basis 3D LUTs into an image-adaptive one, which is employed to transform the source images efficiently. Our model contains less than 0.6 million parameters and runs at a speed of 602 FPS at 4K resolution using one Titan RTX GPU. While being highly efficient, our model also significantly outperforms the state-of-the-art photo enhancement methods in terms of PSNR, SSIM and color difference on two benchmark datasets.
Article
??????The problem of phase retrieval, i.e., the recovery of a function given the magnitude of its Fourier transform, arises in various fields of science and engineering, including electron microscopy, crystallography, astronomy, and optical imaging. Exploring phase retrieval in optical settings, specifically when the light originates from a laser, is natural since optical detection devices [e.g., charge-coupled device (CCD) cameras, photosensitive films, and the human eye] cannot measure the phase of a light wave. This is because, generally, optical measurement devices that rely on converting photons to electrons (current) do not allow for direct recording of the phase: the electromagnetic field oscillates at rates of ~1015??€‰Hz, which no electronic measurement device can follow. Indeed, optical measurement/detection systems measure the photon flux, which is proportional to the magnitude squared of the field, not the phase. Consequently, measuring the phase of optical waves (electromagnetic fields oscillating at 1015??€‰Hz and higher) involves additional complexity, typically by requiring interference with another known field, in the process of holography.
Article
We argue that the study of human vision should be aimed at determining how humans perform natural tasks with natural images. Attempts to understand the phenomenology of vision from artificial stimuli, although worthwhile as a starting point, can lead to faulty generalizations about visual systems, because of the enormous complexity of natural images. Dealing with this complexity is daunting, but Bayesian inference on structured probability distributions offers the ability to design theories of vision that can deal with the complexity of natural images, and that use 'analysis by synthesis' strategies with intriguing similarities to the brain. We examine these strategies using recent examples from computer vision, and outline some important implications for cognitive science.
pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis
  • Marco Eric R Chan
  • Petr Monteiro
  • Jiajun Kellnhofer
  • Gordon Wu
  • Wetzstein
Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5799-5809, 2021.
  • Anpei Chen
  • Zexiang Xu
  • Andreas Geiger
  • Jingyi Yu
  • Hao Su
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. arXiv preprint arXiv:2203.09517, 2022.
  • Frank Dellaert
  • Lin Yen-Chen
Frank Dellaert and Lin Yen-Chen. Neural volume rendering: Nerf and beyond. arXiv preprint arXiv:2101.05204, 2020.
JaxNeRF: an efficient JAX implementation of NeRF
  • Boyang Deng
  • Jonathan T Barron
  • Pratul P Srinivasan
Boyang Deng, Jonathan T. Barron, and Pratul P. Srinivasan. JaxNeRF: an efficient JAX implementation of NeRF, 2020.
  • Kingma Diederik
  • Ba Jimmy
Kingma Diederik, Ba Jimmy, et al. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, pages 273-297, 2014.
  • Tianye Li
  • Mira Slavcheva
  • Michael Zollhoefer
  • Simon Green
  • Christoph Lassner
  • Changil Kim
  • Tanner Schmidt
  • Steven Lovegrove
  • Michael Goesele
  • Zhaoyang Lv
Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, and Zhaoyang Lv. Neural 3d video synthesis. arXiv preprint arXiv:2103.02597, 2021.
Neural sparse voxel fields
  • Lingjie Liu
  • Jiatao Gu
  • Tat-Seng Kyaw Zaw Lin
  • Christian Chua
  • Theobalt
Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651-15663, 2020.
  • Yuan Liu
  • Sida Peng
  • Lingjie Liu
  • Qianqian Wang
  • Peng Wang
  • Christian Theobalt
  • Xiaowei Zhou
  • Wenping Wang
Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, and Wenping Wang. Neural rays for occlusion-aware image-based rendering. arXiv preprint arXiv:2107.13421, 2021.
Nerf: Representing scenes as neural radiance fields for view synthesis
  • Ben Mildenhall
  • P Pratul
  • Matthew Srinivasan
  • Jonathan T Tancik
  • Ravi Barron
  • Ren Ramamoorthi
  • Ng
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405-421. Springer, 2020.
Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks
  • Thomas Neff
  • Pascal Stadlbauer
  • Mathias Parger
  • Andreas Kurz
  • H Joerg
  • Chakravarty R Alla Mueller
  • Anton Chaitanya
  • Markus Kaplanyan
  • Steinberger
Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Joerg H Mueller, Chakravarty R Alla Chaitanya, Anton Kaplanyan, and Markus Steinberger. Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In Computer Graphics Forum, volume 40, pages 45-59. Wiley Online Library, 2021.
Pytorch: An imperative style, high-performance deep learning library
  • Adam Paszke
  • Sam Gross
  • Francisco Massa
  • Adam Lerer
  • James Bradbury
  • Gregory Chanan
  • Trevor Killeen
  • Zeming Lin
  • Natalia Gimelshein
  • Luca Antiga
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
Convolutional occupancy networks
  • Songyou Peng
  • Michael Niemeyer
  • Lars Mescheder
  • Marc Pollefeys
  • Andreas Geiger
Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part III 16, pages 523-540. Springer, 2020.
  • Konstantinos Rematas
  • Andrew Liu
  • P Pratul
  • Jonathan T Srinivasan
  • Andrea Barron
  • Thomas Tagliasacchi
  • Vittorio Funkhouser
  • Ferrari
Konstantinos Rematas, Andrew Liu, Pratul P Srinivasan, Jonathan T Barron, Andrea Tagliasacchi, Thomas Funkhouser, and Vittorio Ferrari. Urban radiance fields. arXiv preprint arXiv:2111.14643, 2021.
Graf: Generative radiance fields for 3d-aware image synthesis
  • Katja Schwarz
  • Yiyi Liao
  • Michael Niemeyer
  • Andreas Geiger
Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33:20154-20166, 2020.
  • Ruizhi Shao
  • Hongwen Zhang
  • He Zhang
  • Yanpei Cao
  • Tao Yu
  • Yebin Liu
  • Doublefield
Ruizhi Shao, Hongwen Zhang, He Zhang, Yanpei Cao, Tao Yu, and Yebin Liu. Doublefield: Bridging the neural surface and radiance fields for high-fidelity human rendering. arXiv preprint arXiv:2106.03798, 2021.
Light field networks: Neural scene representations with single-evaluation rendering
  • Vincent Sitzmann
  • Semon Rezchikov
  • Bill Freeman
  • Josh Tenenbaum
  • Fredo Durand
Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neural scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems, 34, 2021.