Publications (17)6.64 Total impact
- 01/2011: pages 413-442;
- [Show abstract] [Hide abstract]
ABSTRACT: We present a software system that enables path-traced rendering of complex scenes. The system consists of two primary components: an application layer that implements the basic rendering algorithm, and an out-of-core scheduling and data-management layer designed to assist the application layer in exploiting hybrid computational resources (e.g., CPUs and GPUs) simultaneously. We describe the basic system architecture, discuss design decisions of the system's data-management layer, and outline an efficient implementation of a path tracer application, where GPUs perform functions such as ray tracing, shadow tracing, importance-driven light sampling, and surface shading. The use of GPUs speeds up the runtime of these components by factors ranging from two to twenty, resulting in a substantial overall increase in rendering speed. The path tracer scales well with respect to CPUs, GPUs and memory per node as well as scaling with the number of nodes. The result is a system that can render large complex scenes with strong performance and scalability.Computer Graphics Forum 03/2009; 28(2):385 - 396. · 1.64 Impact Factor
Article: Fast BVH Construction on GPUs[Show abstract] [Hide abstract]
ABSTRACT: We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a top-down approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast ray tracing. Both algorithms are combined into a hybrid algorithm that removes existing bottlenecks in the algorithm for GPU construction performance and scalability leading to significantly decreased build time. The resulting hierarchies are close in to optimized SAH hierarchies, but the construction process is substantially faster, leading to a significant net benefit when both construction and traversal cost are accounted for. Our preliminary results show that current GPU architectures can compete with CPU implementations of hierarchy construction running on multicore systems. In practice, we can construct hierarchies of models with up to several million triangles and use them for fast ray tracing or other applications.Computer Graphics Forum 03/2009; 28(2):375 - 384. · 1.64 Impact Factor
Article: Fast BVH Construction on GPUs.Comput. Graph. Forum. 01/2009; 28:375-384.
- [Show abstract] [Hide abstract]
ABSTRACT: We demonstrate an efficient data-parallel algorithm for building large hash tables of millions of elements in real-time. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate real-time performance on large datasets: for 5 million key-value pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.ACM Transactions on Graphics. 01/2009; 28:154:1--154:9.
Article: Resolution-Matched Shadow Maps[Show abstract] [Hide abstract]
ABSTRACT: This article presents resolution-matched shadow maps (RMSM), a modified adaptive shadow map (ASM) algorithm, that is practical for interactive rendering of dynamic scenes. Adaptive shadow maps, which build a quadtree of shadow samples to match the projected resolution of each shadow texel in eye space, offer a robust solution to projective and perspective aliasing in shadow maps. However, their use for interactive dynamic scenes is plagued by an expensive iterative edge-finding algorithm that takes a highly variable amount of time per frame and is not guaranteed to converge to a correct solution. This article introduces a simplified algorithm that is up to ten times faster than ASMs, has more predictable performance, and delivers more accurate shadows. Our main contribution is the observation that it is more efficient to forgo the iterative refinement analysis in favor of generating all shadow texels requested by the pixels in the eye-space image. The practicality of this approach is based on the insight that, for surfaces continuously visible from the eye, adjacent eye-space pixels map to adjacent shadow texels in quadtree shadow space. This means that the number of contiguous regions of shadow texels (which can be efficiently generated with a rasterizer) is proportional to the number of continuously visible surfaces in the scene. Moreover, these regions can be coalesced to further reduce the number of render passes required to shadow an image. The secondary contribution of this paper is demonstrating the design and use of data-parallel algorithms inseparably mixed with traditional graphics programming to implement a novel interactive rendering algorithm. For the scenes described in this paper, we achieve 60--80 frames per second on static scenes and 20--60 frames per second on dynamic scenes for 5122 and 10242 images with a maximum effective shadow resolution of 32,7682 texels.ACM Transactions on Graphics. 10/2007; 26:20:1--20:17.
Chapter: Parallel Prefix Sum (Scan) with CUDA08/2007: pages 851--876;
Conference Paper: Scan Primitives for GPU Computing[Show abstract] [Hide abstract]
ABSTRACT: The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan ,o n NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware; 08/2007
Conference Paper: Glift: Generic Data Structures for the GPUProceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures; 05/2006
Conference Paper: A Work-Efficient Step-Efficient Prefix Sum Algorithm[Show abstract] [Hide abstract]
ABSTRACT: The Prefix-sum algorithm [Hillis and Steele Jr. 1986] is one of the most important building blocks for data-parallel computation. Its applications include parallel implementations of deleting marked elements from an array (stream-compaction), radix-sort, solving re-currence equations, solving tri-diagonal linear systems, and quick-sort. In addition to being a useful building block, the prefix-sum algorithm is a good example of a computation that seems inherently sequential, but for which there are efficient data-parallel algorithms.05/2006
- ACM Trans. Graph. 01/2006; 25:60-99.
- ACM Transactions on Graphics 01/2006; 26(1):60-99. · 3.36 Impact Factor
Conference Paper: Dynamic Adaptive Shadow Maps on Graphics Hardware[Show abstract] [Hide abstract]
ABSTRACT: We present a novel implementation of adaptive shadow maps (ASMs) that performs all shadow lookups and scene analysis on the GPU, enabling interactive rendering with ASMs while moving both the light and camera. Adaptive shadow maps [Fernando et al. 2001] offer a rigorous solution to projective and perspective shadow map aliasing while maintaining the simplicity of a purely image-based technique. The complexity of the ASM data structure, however, has prevented full GPU-based implementations until now. Our ap-proach uses an entirely GPU-based data structure and a blend of graphics and GPU stream programming. We support shadow map effective resolutions up to 131; 072, and, unlike previous imple-mentations, provide smooth transitions between resolution levels by trilinearly filtering (mipmapping) the shadow lookups.ACM SIGGRAPH 2005 Sketches; 08/2005
Conference Paper: Octree Textures on Graphics Hardware[Show abstract] [Hide abstract]
ABSTRACT: We implement an interactive 3D painting application that stores paint in an octree-like GPU-based adaptive data structure. Interac-tive painting of complex or unparameterized surfaces is an impor-tant problem in the digital film community. Many models used in production environments are either difficult to parameterize or are unparameterized implicit surfaces. We address this problem with a system that allows interactive 3D painting of complex, unparame-terized models. The included movie demonstrates interactive paint-ing of a 817k polygon model (as shown in Figure 1) with effective paint resolutions varying between 64, to 2048,. Our implementa-tion differs from previous work [Benson and Davis 2002; Carr and Hart 2004; DeBry et al. 2002; Lefebvre et al. 2004] in two impor-tant ways: first, it uses an adaptive data structure implemented en-tirely on the GPU, and second, it enables interactive performance with high quality by supporting quadlinear (mipmapped) filtering and fast, constant-time data accesses.Technical Sketches Program, ACM SIGGRAPH 2005; 08/2005
- [Show abstract] [Hide abstract]
ABSTRACT: Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan and related primitives also provide the necessary support for the atten- ing transform, which allows for nested data-parallel programs to be compiled into at data-parallel languages. In this paper, we describe the design of ecient scan and segmented scan parallel prim- itives in CUDA for execution on GPUs. Our algorithms are designed using a divide-and-conquer approach that builds all scan primitives on top of a set of primitive intra-warp scan routines. We demonstrate that this design methodology results in routines that are simple, highly ecient, and free of irregular access patterns that lead to memory bank conicts. These algorithms form the basis for current and upcoming releases of the widely used CUDPP library.
University of California, Davis
Davis, CA, United States
- Department of Electrical and Computer Engineering