ArticlePublisher preview available

Polynomial for real-time rendering of neural radiance fields

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

In neural radiance fields (NeRF), generating highly realistic rendering results requires extensive sampling of rays and online query of multilayer perceptrons. However, this results in slow rendering speeds. Previous research has addressed this issue by designing faster evaluation of neural scene representations or precomputing scene properties to reduce rendering time. In this paper, we propose a real-time rendering method called PNeRF. PNeRF utilizes continuous polynomial functions to approximate spatial volume density and color information. Additionally, we separate the view direction information from the rendering equation, leading to a new expression for the volume rendering equation. By taking the starting coordinates of the observation viewpoint and the observation direction vector as inputs to the neural network, we obtain the rendering result for the corresponding observation ray. Thus, the rendering for each ray only requires a single forward inference of the neural network. To further improve rendering speed, we design a six-axis spherical method to store the rendering results corresponding to the starting coordinates of the observation viewpoint and the observation direction vector. This allows us to significantly improve the rendering speed and maintain the rendering quality, with minimal storage space requirements. Experimental validation on LLFF datasets demonstrates that our method improves rendering speed while preserving rendering quality and requiring minimal storage space. These results indicate the potential of our method in the real-time rendering field, providing an effective solution for more efficient rendering.
This content is subject to copyright. Terms and conditions apply.
The Visual Computer
https://doi.org/10.1007/s00371-024-03660-4
RESEARCH
Polynomial for real-time rendering of neural radiance fields
Liping Zhu1·Haibo Zhou1·Silin Wu1·Tianrong Cheng1·Hongjun Sun1
Accepted: 17 September 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024
Abstract
In neural radiance fields (NeRF), generating highly realistic rendering results requires extensive sampling of rays and online
query of multilayer perceptrons. However, this results in slow rendering speeds. Previous research has addressed this issue
by designing faster evaluation of neural scene representations or precomputing scene properties to reduce rendering time.
In this paper, we propose a real-time rendering method called PNeRF. PNeRF utilizes continuous polynomial functions to
approximate spatial volume density and color information. Additionally, we separate the view direction information from
the rendering equation, leading to a new expression for the volume rendering equation. By taking the starting coordinates
of the observation viewpoint and the observation direction vector as inputs to the neural network, we obtain the rendering
result for the corresponding observation ray. Thus, the rendering for each ray only requires a single forward inference of
the neural network. To further improve rendering speed, we design a six-axis spherical method to store the rendering results
corresponding to the starting coordinates of the observation viewpoint and the observation direction vector. This allows us
to significantly improve the rendering speed and maintain the rendering quality, with minimal storage space requirements.
Experimental validation on LLFF datasets demonstrates that our method improves rendering speed while preserving rendering
quality and requiring minimal storage space. These results indicate the potential of our method in the real-time rendering
field, providing an effective solution for more efficient rendering.
Keywords Real-time rendering ·Continuous polynomial functions ·Volume rendering ·Six-axis spherical
1 Introduction
There are diverse applications [113] in the research of
3D scenes. Recent research papers have explored the use
of implicit, coordinate-based neural networks as 3D repre-
sentations, which opens up the promising and new avenues
for the development of neural rendering. Examples of such
approaches include neural volumes [14] and NeRF [15],
which can simulate the 3D properties of scene objects from
BHaibo Zhou
1910055855@qq.com
Liping Zhu
zhuliping@cup.edu.cn
Silin Wu
2022211284@student.cup.edu.cn
Tianrong Cheng
2422184962@qq.com
Hongjun Sun
sunhj68@cup.edu.cn
1Beijing Key Laboratory of Petroleum Data Mining, China
University of Petroleum (Beijing), Beijing 102249, China
a set of calibrated images. Specifically, NeRF represents the
scene as a continuous volume function, enabling high-quality
rendering of novel views from arbitrary viewing angles,
including non-Lambertian effects. The NeRF function is
parameterized by MLP that maps continuous 3D positions to
corresponding volume densities and view-dependent color
information. The success of NeRF has spurred a wealth
of subsequent research, breaking through some limitations,
such as handling dynamic scenes [16,17], scene editing [18,
19], and relighting [20,21].
A common challenge faced by research based on NeRF
is the slow rendering speed, primarily due to the high sam-
pling requirements and expensive neural network queries.
Rendering a single ray requires querying the neural network
for the mappings of each 5D coordinate. Various approaches
have been explored to improve the computational efficiency
of NeRF [2225], resulting in some progress in rendering
speed. However, interactive rendering requirements are still
far from being met.
In this work, we propose a real-time rendering method
with minimal storage cost. Our approach represents volume
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Distributed parallel rendering provides a valuable way to navigate large-scale scenes. However, previous works typically focused on outputting ultra-high-resolution images. In this paper, we target on improving the interactivity of navigation and propose a large-scale scene navigation method, GuideRender, based on multi-modal view frustum movement prediction. Given previous frames, user inputs and object information, GuideRender first extracts frames, user inputs and objects features spatially and temporally using the multi-modal extractor. To obtain effective fused features for prediction, we introduce an attentional guidance fusion module to fuse these features of different domains with attention. Finally, we predict the movement of the view frustum based on the attentional fused features and obtain its future state for loading data in advance to reduce latency. In addition, to facilitate GuideRender, we design an object hierarchy hybrid tree for scene management based on the object distribution and hierarchy, and an adaptive virtual sub-frustum decomposition method based on the relationship between the rendering cost and the rendering node capacity for task decomposition. Experimental results show that GuideRender outperforms baselines in navigating large-scale scenes. We also conduct a user study to show that our method satisfies the navigation requirements in large-scale scenes.
Article
Direct volume rendering (DVR) is a technique that emphasizes structures of interest (SOIs) within a volume visually, while simultaneously depicting adjacent regional information, e.g., the spatial location of a structure concerning its neighbors. In DVR, transfer function (TF) plays a key role by enabling accurate identification of SOIs interactively as well as ensuring appropriate visibility of them. TF generation typically involves non-intuitive trial-and-error optimization of rendering parameters, which is time-consuming and inefficient. Attempts at mitigating this manual process have led to approaches that make use of a knowledge database consisting of pre-designed TFs by domain experts. In these approaches, a user navigates the knowledge database to find the most suitable pre-designed TF for their input volume to visualize the SOIs. Although these approaches potentially reduce the workload to generate the TFs, they, however, require manual TF navigation of the knowledge database, as well as the likely fine tuning of the selected TF to suit the input. In this work, we propose a TF design approach, CBR-TF, where we introduce a new content-based retrieval (CBR) method to automatically navigate the knowledge database. Instead of pre-designed TFs, our knowledge database contains volumes with SOI labels. Given an input volume, our CBR-TF approach retrieves relevant volumes (with SOI labels) from the knowledge database; the retrieved labels are then used to generate and optimize TFs of the input. This approach largely reduces manual TF navigation and fine tuning. For our CBR-TF approach, we introduce a novel volumetric image feature which includes both a local primitive intensity profile along the SOIs and regional spatial semantics available from the co-planar images to the profile. For the regional spatial semantics, we adopt a convolutional neural network to obtain high-level image feature representations. For the intensity profile, we extend the dynamic time warping technique to address subtle alignment differences between similar profiles (SOIs). Finally, we propose a two-stage CBR scheme to enable the use of these two different feature representations in a complementary manner, thereby improving SOI retrieval performance. We demonstrate the capabilities of our CBR-TF approach with comparison with a conventional approach in visualization, where an intensity profile matching algorithm is used, and also with potential use-cases in medical volume visualization.
Article
Neural radiance fields enable state-of-the-art photorealistic view synthesis. However, existing radiance field representations are either too compute-intensive for real-time rendering or require too much memory to scale to large scenes. We present a Memory-Efficient Radiance Field (MERF) representation that achieves real-time rendering of large-scale scenes in a browser. MERF reduces the memory consumption of prior sparse volumetric radiance fields using a combination of a sparse feature grid and high-resolution 2D feature planes. To support large-scale unbounded scenes, we introduce a novel contraction function that maps scene coordinates into a bounded volume while still allowing for efficient ray-box intersection. We design a lossless procedure for baking the parameterization used during training into a model that achieves real-time rendering while still preserving the photorealistic view synthesis quality of a volumetric radiance field.
Article
Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.
Article
Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed to be fully trained. It may not only incur huge computation costs but also affect the architecture ranking in the retraining procedure. We propose a multi-teacher-guided NAS, which proposes to use the adaptive ensemble and perturbation-aware knowledge distillation algorithm in the one-shot-based NAS algorithm. The optimization method aiming to find the optimal descent directions is used to obtain adaptive coefficients for the feature maps of the combined teacher model. Besides, we propose a specific knowledge distillation process for optimal architectures and perturbed ones in each searching process to learn better feature maps for later distillation procedures. Comprehensive experiments verify our approach is flexible and effective. We show improvement in precision and search efficiency in the standard recognition dataset. We also show improvement in correlation between the accuracy of the search algorithm and true accuracy by NAS benchmark datasets.
Article
Recent transformer-based models, especially patch-based methods, have shown huge potentiality in vision tasks. However, the split fixed-size patches divide the input features into the same size patches, which ignores the fact that vision elements are often various and thus may destroy the semantic information. Also, the vanilla patch-based transformer cannot guarantee the information communication between patches, which will prevent the extraction of attention information with a global view. To circumvent those problems, we propose an Efficient Attention Pyramid Transformer (EAPT). Specifically, we first propose the Deformable Attention, which learns an offset for each position in patches. Thus, even with split fixed-size patches, our method can still obtain non-fixed attention information that can cover various vision elements. Then, we design the Encode-Decode Communication module (En-DeC module), which can obtain communication information among all patches to get more complete global attention information. Finally, we propose a position encoding specifically for vision transformers, which can be used for patches of any dimension and any length. Extensive experiments on the vision tasks of image classification, object detection, and semantic segmentation demonstrate the effectiveness of our proposed model. Furthermore, we also conduct rigorous ablation studies to evaluate the key components of the proposed structure.