Erik Brunvand

Erik Brunvand
University of Utah | UOU · School of Computing

PhD

About

123
Publications
27,958
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,603
Citations

Publications

Publications (123)
Article
Reports on the history and development of computer aided design systems. In quick succession between 1964 and 1971, our field saw the proposal of Moore’s law,1 the coining of the term “computer architecture,”2 and the introduction of the first microprocessor.3 For much of the five decades since then, we have benefitted extraordinarily from both the...
Article
Full-text available
Data movement, particularly access to the main memory, has been the bottleneck of most computing problems. Ray tracing is no exception. We propose an unconventional solution that combines a ray ordering scheme that minimizes access to the scene data with a large on-chip buffer acting as near-compute storage that is spread over multiple chips. We de...
Article
Bounding volume hierarchies (BVH) are the most widely used acceleration structures for ray tracing due to their high construction and traversal performance. However, the bounding planes shared between parent and children bounding boxes is an inherent storage redundancy that limits further improvement in performance due to the memory cost of reading...
Article
Water issues are especially meaningful in the Western United States, with a long history of struggle, controversy, and politics. Achieving desirable outcomes in terms of water quality and water rights requires collaboration and compromise at all points in the discussion. Collective Currents is an interactive art installation, created in a collabora...
Conference Paper
Image glitching and data-bending are used to introduce image formats, data manipulation, and data visualization to beginning CS students and non-major students taking computing courses with no coding required.
Conference Paper
Full-text available
We propose an unconventional solution to high-performance ray tracing that combines a ray ordering scheme that minimizes access to the scene data with a large on-chip buffer acting as near-compute storage that is spread over multiple chips. We demonstrate the effectiveness of our approach by introducing Mach-RT (Many chip - Ray Tracing), a new hard...
Conference Paper
Computer Science and Computer Engineering classes related to digital circuits, embedded systems, Human Computer Interaction (HCI), and a wide variety of "maker" subjects, would often like to include physical computing projects. Extending these physical computing ideas to physical realization of circuits is the next logical step, and has traditional...
Conference Paper
Full-text available
Optimizations for ray tracing have typically focused on decreasing the time taken to render each frame. However, in modern computer systems it may actually be more important to minimize the energy used, or some combination of energy and render time. Understanding the time and energy costs per ray can enable the user to make conscious trade offs bet...
Article
Full-text available
Optimizations for ray tracing have typically focused on decreasing the time taken to render each frame. However, in modern computer systems it may actually be more important to minimize the energy used, or some combination of energy and render time. Understanding the time and energy costs per ray can enable the user to make conscious trade-offs bet...
Conference Paper
Full-text available
SimTRaX is a simulation infrastructure for simultaneous exploration of highly parallel accelerator architectures and how applications map to them. The infrastructure targets both cycle-accurate and functional simulation of architectures with thousands of simple cores that may share expensive computation and memory resources. A modified LLVM backend...
Data
SimTRaX is a simulation infrastructure for simultaneous exploration of highly parallel accelerator architectures and how applications map to them. The infrastructure targets both cycle-accurate and functional simulation of architectures with thousands of simple cores that may share expensive computation and memory resources. A modified LLVM backend...
Article
Full-text available
We introduce a new motion blur computation method for ray tracing that provides an analytical approximation of motion blurred visibility per ray. Rather than relying on timestamped rays and Monte Carlo sampling to resolve the motion blur, we associate a time interval with rays and directly evaluate when and where each ray intersects with animated o...
Article
There is mounting evidence that manufacturing energy and environmental costs are a growing factor in the overall energy footprint of computing systems. The quantification of these impacts requires the evaluation of both the manufacturing and use phase energy/environmental costs of major integrated circuit (IC) components, including processing units...
Conference Paper
Full-text available
Hardware acceleration for ray tracing has been a topic of great interest in computer graphics. However, even with proposed custom hardware, the inherent irregularity in the memory access pattern of ray tracing has limited its performance, compared with rasterization on commercial GPUs. We provide a different approach to hardware-accelerated ray tra...
Conference Paper
We describe our experience designing and delivering a general education technological fluency course that frames the discussion of computer science and engineering technology (electronics and programming) in the context of sound-art: art that uses sound as its medium. This course is aimed at undergraduate students from a wide variety of backgrounds...
Article
We describe our experience designing and delivering a general education technological fluency course that frames the discussion of computer science and engineering technology (electronics and programming) in the context of sound-art: art that uses sound as its medium. This course is aimed at undergraduate students from a wide variety of backgrounds...
Conference Paper
The growing do-it-yourself movement relies heavily on electronics, code, and data. Abundant resources and low cost materials result in people of all ages seeking to learn how to master these modern creative supplies. We will discuss how computer science educators might tap into teachable moments provided by this voluntary, informal, enthusiastic mo...
Conference Paper
There is mounting evidence that manufacturing energy and environmental costs are a growing factor in the overall energy footprint of computing systems. The quantification of these impacts requires the evaluation of both the manufacturing and use phase energy/environmental costs of major integrated circuit (IC) components, including processing units...
Conference Paper
General education curricula at many universities require students to take courses in wide ranging areas outside of their specific majors. Conspicuously missing from many of these curricula, however, are engineering and technology courses. As part of a program sponsored by our Office of Undergraduate Studies at the University of Utah, I am developin...
Conference Paper
Embedded systems classes and labs can often benefit from having students design their own systems including printed circuit boards (PCBs). These boards can be the basis of either complete small microcontroller systems or add-on boards to existing platforms. However, modern circuit components are very often available only in tiny surface-mount techn...
Conference Paper
Are fine arts and technology compatible partners" Do these disciplines support each other or flinch when they are combined like oil and water" Do collaborative efforts provide interesting insights and opportunities for students" For practitioners" There seems to be an explosion of interest in exploring arts and technology connections: new media, di...
Article
Full-text available
We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases L1 hit rates and reduces off-chip memory energy subst...
Article
Speculatorum Oculi (The Eyes of Spies) comments on current surveillance activities of governments and corporations through an installation that includes an architectural model surveilled with looming video cameras providing live feeds to a set of video monitors. These monitors show views of the model and of other video cameras placed around the ins...
Conference Paper
The definition of "computer graphics" as used by artists in new media and kinetic areas of the arts is much more expansive than simply rendering to a screen. A visit to the SIGGRAPH art gallery, for example, will showcase a wide variety of uses of computing, embedded control, sensors, and actuators in the service of art. Kinetic art using embedded...
Conference Paper
The definition of "computer graphics" as used by artists in new media and kinetic areas of the arts is much more expansive than simply rendering to a screen. A visit to the SIGGRAPH 2013 Art Gallery, for example, reveals a wide variety of uses of physical computing, embedded control, sensors, and actuators in the service of art. This course is for...
Conference Paper
Full-text available
We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing while keeping performance high. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases the L1 hit rate and re...
Conference Paper
This workshop introduces embedded programming and hardware using Arduino in a creative context to make machines that make drawings. This is a powerful way to introduce programming and physical computing concepts to students from high school to undergraduate and to students who might not normally be intrigued by a computing course. Participants expe...
Poster
Full-text available
Prediction of radio frequency (RF) energy propagation in the presence of complex outdoor terrain features—urban environments, for example—is of great interest when planning, optimizing and analyzing wireless networks. A tool for fast prediction could improve network coverage, provide estimates of signal strength throughout the environment, estimate...
Article
Full-text available
Large-scale chip multiprocessors will likely be heterogeneous. It has been suggested by several groups that it may be worthwhile to implement some cores that are specially tuned to execute common code patterns. One such common application that will execute on all future processors is of course the operating system. Many future workloads will spend...
Article
Full-text available
Bounding volume hierarchies (BVHs) are a popular acceleration structure choice for animated scenes rendered with ray tracing. This is due to the relative simplicity of refitting bounding volumes around moving geometry. However, the quality of such a refitted tree can degrade rapidly if objects in the scene deform or rearrange significantly as the a...
Conference Paper
Full-text available
Modern and future server-class processors will incorporate many cores. Some studies have suggested that it may be worthwhile to dedicate some of the many cores for specific tasks such as operating system execution. OS off-loading has two main benefits: improved performance due to better cache utilization and improved power efficiency due to smarter...
Article
Printmaking is a fine art practice that encompasses a variety of media including intaglio, relief, lithography and screen-printing. In this collaborative research project the authors extend the traditional boundaries of printmaking to create editions of micro-scale prints on the surface of silicon integrated circuits using the layers of materials n...
Article
Full-text available
Bounding volume hierarchies are a popular choice for ray tracing animated scenes due to the relative simplicity of refitting bounding volumes around moving geometry. However, the quality of such a refitted tree can degrade rapidly if objects in the scene deform or rearrange significantly as the animation progresses, resulting in dramatic increases...
Article
Full-text available
We describe a cross-disciplinary collaborative course that pairs computer science and engineering (CSE) students with art students to engage in joint engineering design and cre-ative studio projects. These projects combine embedded system design with sculpture to create kinetic art. We be-lieve that this is a natural pairing of two disparate disci-...
Article
Full-text available
The design of computer games can be a powerful motivator as students learn about computer architecture and design. Students in classes where computer designs are developed and implemented (usually on Field Programmable Gate Ar-rays (FPGAs)) seem much more highly motivated if their computer design can be used for something visual and inter-active wh...
Conference Paper
Ray tracing efficiently models complex illumination effects to improve visual realism in computer graphics. Typical modern GPUs use wide SIMD processing, and have achieved impressive performance for a variety of graphics processing including ray tracing. However, SIMD efficiency can be reduced due to the divergent branching and memory access patter...
Conference Paper
Full-text available
In the past ten years, computer architecture has seen a paradigm shift from emphasizing single thread performance to energy efficient, throughput oriented, chip multiprocessors. Several studies have suggested that it may be worthwhile to off-load execution of the operating system (OS) to one or more of these cores, or reconfigure hardware during OS...
Article
Full-text available
Threaded Ray eXecution (TRaX) is a highly parallel multithreaded multicore processor architecture designed for real-time ray tracing. The TRaX architecture consists of a set of thread processors that include commonly used functional units (FUs) for each thread and that share larger FUs through a programmable interconnect. The memory system takes ad...
Conference Paper
Full-text available
A synthetic noise function is a key component of most com- puter graphics rendering systems. This pseudo-random noise function is used to create a wide variety of natural looking textures that are applied to objects in the scene. To be useful, the generated noise should be repeatable while ex- hibiting no discernible periodicity, anisotropy, or ali...
Article
Large-scale multi-core chips open up the possibility of implementing heterogeneous cores on a single chip, where some cores can be customized to execute common code patterns. The operating system is an example of a common code pattern that is constantly executing on every processor. It is therefore a prime candidate for core customization. Recent w...
Article
Full-text available
Large-scale multi-core chips open up the possibility of implement- ing heterogeneous cores on a single chip, where some cores can be customized to execute common code patterns. The operating sys- tem is an example of a common code pattern that is constantly ex- ecuting on every processor. It is therefore a prime candidate for core customization. Re...
Conference Paper
TRaX (Threaded Ray eXecution) is a highly parallel multi-threaded, multi-core processor architecture designed for real-time ray tracing. One motivation behind TRaX is to accelerate single-ray performance instead of relying on ray-packets in SIMD mode to boost throughput, which can fail as packets become incoherent. To evaluate the effectiveness of...
Conference Paper
Full-text available
Ray tracing is a technique used for generating highly realistic computer graphics images. In this paper, we explore the design of a simple but extremely parallel, multi-threaded, multi-core processor architecture that performs real-time ray tracing. Our architecture, called TRaX for Threaded Ray eXecution, consists of a set of thread states that in...
Article
The modern graphics processing units (GPUs), found on almost every personal computer, use the z-buffer algorithm to compute visibility. Ray tracing, an alternative to the z-buffer algorithm, delivers higher visual quality than the z-buffer algorithm but has historically been too slow for interactive use. However, ray tracing has benefited from impr...
Article
Full-text available
Almost all current games are implemented using the graphics processing units (GPUs) found on almost every PC. These GPUs use the z-buffer algorithm to do visibility calculations. Ray tracing, an alternative to the z-buffer algorithm, delivers higher visual quality than the z-buffer algorithm but has historically been too slow for interactive use. H...
Article
Full-text available
Figure 1: Test scenes used to evaluate the DRPU ASIC: Conference (282k triangles) , Mafia (15k triangles), Skeleton (16k triangles), Helix (78k triangles), and DynGael (85k triangles). For more test scenes see Figure 6. ABSTRACT Recursive ray tracing is a powerful rendering technique used to compute realistic images by simulating the global light t...
Conference Paper
We present a technique for generating robust self-timed completion signals for general dynamic datapath circuits. The wrapper circuit is based on our previous domino semi-bundled delay (SBD) circuits, but uses DCVSL circuits in the wrapper for higher performance. We describe the basic SBD-DCVSL building blocks in the template with respect to their...
Conference Paper
We propose a simulation-based technique for analysis and optimization of extended burst-mode (XBM) asynchronous controllers. In asynchronous controllers of this sort, timing information on control signals is significant both for performance enhancement and timing validation. Timing information, specifically information about relative signal arrival...
Conference Paper
Asynchronous microengines are an attractive alternative to globally synchronous systems for the realization of high performance programmable controllers. However, because of the specific demands of asynchronous signaling, it is not always easy to use existing standard cell libraries to implement asynchronous microengines. In this paper we present t...
Conference Paper
Full-text available
Simulators for digital systems operate at a variety of levels of abstraction varying from detailed analog and switch level modeling of the transistor to cycle based descriptions of entire systems. We propose an even higher level simulator, called ARCS, based on the abstraction of an asynchronous communication event rather than of a clock cycle. Mod...
Article
In order to reason about the correctness of asynchronous circuit implementations and specifications, Dill has developed a variant of trace theory[ 1]. Tracetheory describes the behavior of an asynchronous circuit by representing its possible executions as strings called"traces". A useful relation defined in this theory is called conformance, which...
Article
The NSR (Non-Synchronous RISC) processor is a general purpose processor structured as a collection of self-timed units that operate concurrently and communicate over bundled data channels in the style of micropipelines. These units correspond to standard synchronous pipeline stages such as Instruction Fetch, Instruction Decode, Execute, Memory Inte...
Conference Paper
Full-text available
We introduce a simple hierarchical design technique for using dynamic domino circuits to build high-performance self-timed data path circuits. We wrap the dynamic domino circuit in a wrapper that communicates using a request/acknowledge protocol and mediates the pre-charge/evaluate cycle of the dynamic logic. We apply standard bundled delay matchin...
Conference Paper
We introduce a simple hierarchical design technique for building high-performance self-timed components using dynamic domino-style circuits. This technique is useful for building handshaking style functional blocks and for self-timed data path components. We wrap the dynamic domino circuit in a wrapper that communicates using a request/acknowledge...
Article
Full-text available
This thesis describes an evaluation of a locally-clocked module. Locally-clocked modules can be used as synchronous datapath elements in synchronous systems or as asynchronous elements in an asynchronous system. One key element of a locallyclocked module is a stoppable ring oscillator (or stoppable clock). If locally-clocked modules are to be used,...
Article
Full-text available
Verifying the functional correctness of a parameterized protocol on all valid branching networks is a difficult problem that can not be solved using simulation alone because the number and shape of valid branching networks is unbounded. This problem is of particular interest because multibus I/O and memory protocols can be modelled as parameterized...
Article
Full-text available
In order to increase performance, circuit designers are beginning to use more aggressive timed circuit designs instead of traditional synchronous static logic designs. Recent design examples have shown that signi cant performance gains are achieved when these aggressive circuit styles are used. Correct operation of these aggressive circuit styles i...
Article
This thesis presents a method of deriving a performance metric for timed asynchronous circuits called a stochastic cycle period, which uses analytical techniques combined with simulation to capture the stochastic prole of the system. The stochastic cycle period is constructed by nding transition and steady-state probabilities in a reachability grap...
Article
Full-text available
This dissertation addresses the problem of formally verifying the correctness of pipelined microprocessors at the micro-architectural level of abstraction. Contemporary processor designs are highly complex, employing sophisticated performance enhancing techniques such as superscalar pipelining, out-of-order execution, branch prediction and speculat...
Conference Paper
Full-text available
Designing asynchronous circuits is becoming easier as a number of design styles are making the transition from research projects to real, usable tools. However designing asynchronous “systems” is still a difficult problem. We define asynchronous systems to be medium to large digital systems whose descriptions include both datapath and control, that...
Article
Full-text available
. Asynchronous or self-timed systems that do not rely on a global clock to keep system components synchronized can offer significant advantages over traditional clocked circuits in a variety of applications. However, design of self-timed systems has long been considered too difficult because of the specialized circuits required and the lack of tool...
Article
Full-text available
In order to increase performance, circuit designers are beginning to move away from traditional, synchronous designs based on static logic. Recent design examples have shown that significant performance gains are realized when aggressive circuit styles are used. Circuit correctness in these aggressive circuit styles is highly timing dependent, and...
Conference Paper
Full-text available
Asynchronous systems are being viewed as an increasingly viable alternative to purely synchronous systems. This paper gives an overview of the current state of the art in practical asynchronous circuit and system design in four areas: controllers, datapaths, processors, and the design of asynchronous/synchronous interfaces
Article
Full-text available
Asynchronous circuit design has the potential to produce circuits superior to those of synchronous circuit design. Current synchronous methods of architectural-level synthesis do not exploit properties inherent to asynchronous circuits. This research describes potential optimizations and techniques that can be applied to the architectural-level des...
Article
Most high-level synthesis tools for asynchronous circuits take descriptions in concurrent hardware description languages and generate networks of macromodules or handshake components. In this paper, we propose a peephole optimizer for these networks. Our peephole optimizer first deduces an equivalent blackbox behavior for the network using Dill's t...
Conference Paper
Full-text available
Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utiliz...
Article
Full-text available
This thesis presents a framework for the specification and compilation of modules in a system that uses different synchronization paradigms. These timed systems are described by using timed handshaking expansions (HSE) and a standard hardware description language, namely VHDL. Synthesizable subsets of these languages are defined to include construc...
Article
Full-text available
The design and synthesis of asynchronous circuits is gaining importance in both the industrial and academic worlds. Timed circuits are a class of asynchronous circuits that incorporate explicit timing information in the specification. This information is used throughout the synthesis procedure to optimize the design. In order to synthesize a timed...
Conference Paper
Full-text available
Because irregular applications have unpredictable memory access patterns, their performance is dominated by memory behavior. The Impulse configurable memory controller will enable significant performance improvements for irregular applications, because it can be configured to optimize memory accesses on an application-by-application basis. In this...
Conference Paper
This paper presents a Design for Testability (DFT) tool called ACT (Asynchronous Circuit Testing) which uses a partial scan technique to make macro-module based self-timed circuits testable. The ACT tool is the first of its kind for testing macro-module based self-timed circuits. ACT modifies designs automatically to incorporate partial scan and pr...
Conference Paper
Recent practical advances in asynchronous circuit and system design have resulted in renewed interest by circuit designers. Asynchronous systems are being viewed as an increasingly viable alternative to globally synchronous system organization. This tutorial will present the current state of the art in asynchronous circuit and system design in thre...
Conference Paper
Full-text available
We describe a technique to generate critical hazard-free tests for self-timed control circuits built using a macro-module library, in a partial scan based DFT environment. We propose a six-valued algebra to generate these tests which are guaranteed to be critical hazard free under an unbounded delay model. This algebra has been incorporated in a D-...
Article
Full-text available
The computer hacker has been depicted in the popularpress as a socially maladjustedteenager whose goal is to wreak malicious havoc on unsuspecting computer users. In the culture of the computer programmer however, the hacker takes on a far different aspect. The true hacker is raised to heroic status with tales of amazing feats circulated through co...
Article
Self-timed processor designs offer several advantages over traditional synchronous designs. Further, when an asynchronous philosophy is incorporated at every stage of the design, the microarchitecture is more closely linked to the basic structures of the self-timed circuits themselves, and the resulting processor is quite simple and elegant. The Fr...
Conference Paper
Decoupled computer architectures provide an effective means of exploiting instruction level parallelism. Self-timed micropipeline systems are inherently decoupled due to the elastic nature of the basic FIFO structure, and may be ideally suited for constructing decoupled computer architectures. Fred is a self-timed decoupled, pipelined computer arch...
Article
The problems with synchronous designs at high clock frequencies have been well documented. This makes an asynchronous approach attractive for high speed technologies like GaAs. We investigate the issues involved by describing the design of a parallel multiplier that can be part of a floating point multiplier. We first present a new architecture cal...
Conference Paper
In this paper, we present a methodology to perform fast testing of the control path of self-timed circuits. The speedup is achieved by testing all the execution paths in the control simultaneously. The circuits considered in this paper are those designed using an OCCAM based circuit compiler (1991). This Compiler translates an OCCAM program descrip...
Conference Paper
Full-text available
Self-timed systems structured as multiple concurrent processes and communicating through self-timed queues are a convenient way to implement decoupled computer architectures. Machines of this type can exploit instruction level parallelism in a natural way, and can be easily modified and extended. However, providing a precise exception model for a s...
Article
Self-timed systems structured as multiple concurrent processes and communicating through self-timed queues are a convenient way to implement decoupled computer architectures. Machines of this type can exploit instruction level parallelism in a natural way, and can be easily modified and extended. However, providing a precise exception model for a s...
Conference Paper
This paper presents a partial scan method for testing both the control and data path parts of macromodule based self-timed circuits for stuck-at faults. Compared with other proposed test methods for testing control paths in self-timed circuits, this technique offers better fault coverage under a stuck-at input model than methods using self-checking...
Conference Paper
Self-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in this type of FIFO as the communication required to send new data to the FIFO is local to only the first element of the FIFO. Circuit density can also be high because the control overhead is very sm...