Eric Hielscher’s research while affiliated with The Graduate Center, CUNY and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Figure 9: Matrix Multiply Performance vs Tile Size 
Figure 10: Matrix Multiply Runtimes 
Figure 11: Matrix Multiply Autotuner Performance 
Figure 12: Matrix Multiply Performance Compared To C
Locality Optimization for Data Parallel Programs
  • Article
  • Full-text available

April 2013

·

1,111 Reads

·

5 Citations

Eric Hielscher

·

·

Productivity languages such as NumPy and Matlab make it much easier to implement data-intensive numerical algorithms. However, these languages can be intolerably slow for programs that don't map well to their built-in primitives. In this paper, we discuss locality optimizations for our system Parakeet, a just-in-time compiler and runtime system for an array-oriented subset of Python. Parakeet dynamically compiles whole user functions to high performance multi-threaded native code. Parakeet makes extensive use of the classic data parallel operators Map, Reduce, and Scan. We introduce a new set of data parallel operators,TiledMap, TiledReduce, and TiledScan, that break up their computations into local pieces of bounded size so as better to make use of small fast memories. We introduce a novel tiling transformation to generate tiled operators automatically. Applying this transformation once tiles the program for cache, and applying it again enables tiling for registers. The sizes for cache tiles are left unspecified until runtime, when an autotuning search is performed. Finally, we evaluate our optimizations on benchmarks and show significant speedups on programs that exhibit data locality.

Download

Figure 6: K-Means GPU Times with 30 Features, K = 3 
Parakeet: A Just-In-Time Parallel Accelerator for Python

January 2012

·

190 Reads

·

34 Citations

High level productivity languages such as Python or Matlab enable the use of computational resources by non-expert programmers. However, these languages often sac-rifice program speed for ease of use. This paper proposes Parakeet, a library which provides a just-in-time (JIT) parallel accelerator for Python. Parakeet bridges the gap between the usability of Python and the speed of code written in efficiency languages such as C++ or CUDA. Parakeet accelerates data-parallel sections of Python that use the standard NumPy scientific comput-ing library. Parakeet JIT compiles efficient versions of Python functions and automatically manages their execu-tion on both GPUs and CPUs. We assess Parakeet on a pair of benchmarks and achieve significant speedups.

Citations (2)


... Little previous work has been done on automated tiling of functional programs composed of arbitrarily nested parallel patterns. Hielscher proposes a set of formal rules for tiling parallel operators map, reduce, and scan in the Parakeet JIT compiler, but these rules can be applied only for a small subset of nesting combinations [25]. Spartan [26] is a runtime system with a set of high-level operators (e.g., map and reduce) on multi-dimensional arrays, which automatically tiles and distributes the arrays in a way that minimizes the communication cost between nodes in cluster environments. ...

Reference:

Generating Configurable Hardware from Parallel Patterns
Locality Optimization for Data Parallel Programs

... There are many works on extending the Python programming language for GPU programming, e.,g, [6,26,35,42]. Closest to our approach are Parakeeet [35] and Copperhead [6], which are high-level data-parallel languages embedded in Python. In both DSLs, programmers write array computations that are made parallel by the use of skeletons like map and reduce. ...

Parakeet: A Just-In-Time Parallel Accelerator for Python