Mark Gebhart

Mark Gebhart
University of Texas at Austin | UT · Department of Computer Science

About

16
Publications
1,794
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
723
Citations

Publications

Publications (16)
Article
Throughput architectures such as GPUs require substantial hardware resources to hold the state of a massive number of simultaneously executing threads. While GPU register files are already enormous, reaching capacities of 256KB per streaming multiprocessor (SM), we find that nearly half of real-world applications we examined are register-bound and...
Conference Paper
Modern throughput processors such as GPUs employ thousands of threads to drive high-bandwidth, long-latency memory systems. These threads require substantial on-chip storage for registers, cache, and scratchpad memory. Existing designs hard-partition this local storage, fixing the capacities of these structures at design time. We evaluate modern GP...
Article
Full-text available
Modern graphics processing units (GPUs) employ a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complex thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing en...
Article
Full-text available
As processors increasingly become power limited, performance improvements will be achieved by rearchitecting systems with energy efficiency as the primary design constraint. While some of these optimizations will be hardware based, com-bined hardware and software techniques likely will be the most productive. This work redesigns the register file s...
Conference Paper
Full-text available
Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing e...
Conference Paper
Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing e...
Article
Full-text available
Modern multicore chips target thread-level parallelism at the expense of increasing instruction-level parallelism from single threaded programs. While recent work has attempted to construct a wide-ILP machine from multiple simple cores, these approaches suffer from ISA overheads or scalability challenges. In this paper, we describe an architecture...
Article
The TRIPS hardware prototype is the first instantiation of anExplicit Data Graph Execution (EDGE) architecture. Building the com piler, toolset, and system software for the prototype required sup porting the system's unique dataflow construction, its banked registerand memory configurations, and its novel Instruction Set Architecture . In particu-...
Article
The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine conc...
Conference Paper
Full-text available
The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine conc...
Conference Paper
The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine conc...
Technical Report
The TRIPS system employs a new instruction set architecture(ISA) called Explicit Data Graph Ex- ecution (EDGE) that renegotiates the boundary between hard ware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model i n which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine c...
Article
Diminishing performance gains in conventional architectu res are driving modern architectures to exploit parallelism mo re effectively. Next-generation architectures hold promisein the Digital Signal Processing (DSP) arena where high perfor- mance and power efficiency are equally important. To better identify optimization techniques on these emergi...
Article
Diminishing performance gains in conventional architectu res are fueling novel designs which more effectively extrac t paral- lelism and have the potential to change the nature of archite ctural bottlenecks. Consequently, workload characteriza tion is of a growing importance in the design of modern high performanc e computing architectures. However...
Article
Acknowledgments I would like to thank the following people: • My supervisor Dr. Stephen Keckler for his guidance over the last two years • Saurabh Drolia, Sadia Sharif, Paul Gratz, and the TRIPS team members for their help throughout this project. Mark,Gebhart The University of Texas at Austin May 2006 iv Implementation of a Streaming Parallel Mode...

Network

Cited By