Conference PaperPDF Available

Controlling chaos on safe side-effects in data-parallel operations

Authors:

Abstract and Figures

With the rising variety of hardware designs for multi-core systems, the effectiveness in exploiting implicit concurrency of programs plays a more vital role for programming such systems than ever before. We believe that a combination of a data-parallel approach with a declarative programming-style is up to that task: Data-parallel approaches are known to enable compilers to make efficient use of multi-processors without requiring low-level program annotations. Combining the data-parallel approach with a declarative programming-style guarantees semantic equivalence between sequential and concurrent executions of data parallel operations. Furthermore, the side-effect free setting and explicit model of dependencies enables compilers to maximise the size of the data-parallel program sections. However, the strength of the rigidity of the declarative approach also constitutes its weakness: Being bound to observe all data dependencies categorically rules out the use of side-effecting operations within data-parallel sections. Not only does this limit the size of these regions in certain situations, but it may also hamper an effective workload distribution. Considering side effects such as plotting individual pixels of an image or output for debugging purposes, there are situations where a non-deterministic order of side-effects would not be considered harmful at all. We propose a mechanism for enabling such non-determinism on the execution of side-effecting operations within data-parallel sections without sacrificing the side-effect free setting in general. Outside of the data-parallel sections we ensure single-threading of side-effecting operations using uniqueness typing. Within data-parallel operations however we allow the side-effecting operations of different threads to occur in any order, as long as effects of different threads are not interleaved. Furthermore, we still model the dependencies arising from the manipulated states within the data parallel sections. This measure preserves the explicitness of all data dependencies and therefore it preserves the transformational potential of any restructuring compiler.
Content may be subject to copyright.
A preview of the PDF is not available
Conference Paper
We present the ins and outs of the purely functional, data parallel programming language SaC (Single Assignment C). SaC defines state- and side-effect-free semantics on top of a syntax resembling that of imperative languages like C/C++/C# or Java: functional programming with curly brackets. In contrast to other functional languages data aggregation in SaC is not based on lists and trees, but puts stateless arrays into the focus. SaC implements an abstract calculus of truly multidimensional arrays that is adopted from interpreted array languages like Apl. Arrays are abstract values with certain structural properties. They are treated in a holistic way, not as loose collections of data cells or indexed memory address ranges. Programs can and should be written in a mostly index-free style. Functions consume array values as arguments and produce array values as results. The array type system of SaC allows such functions to abstract not only from the size of vectors or matrices but likewise from the number of array dimensions, supporting a highly generic programming style. The design of SaC aims at reconciling high productivity in software engineering of compute-intensive applications with high performance in program execution on modern multi- and many-core computing systems. While SaC competes with other functional and declarative languages on the productivity aspect, it competes with hand-parallelised C and Fortran code on the performance aspect. We achieve our goal through stringent co-design of programming language and compilation technology. The focus on arrays in general and the abstract view of arrays in particular combined with a functional state-free semantics are key ingredients in the design of SaC. In conjunction they allow for far-reaching program transformations and fully compiler-directed parallelisation. From literally the same source code SaC currently supports symmetric multi-socket, multi-core, hyperthreaded server systems, CUDA-enables graphics accelerators and the MicroGrid, an innovative general-purpose many-core architecture. The CEFP lecture provides an introduction into the language design of SaC, followed by an illustration of how these concepts can be harnessed to write highly abstract, reusable and elegant code. We conclude with outlining the major compiler technologies for achieving runtime performance levels that are competitive with low-level machine-oriented programming environments.
Article
Full-text available
We give an in-depth introduction to the design of our functional array programming language SaC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SaC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SaC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.
Conference Paper
Full-text available
Construction of complex array operations by composition of more basic ones allows for abstract and concise specifications of algorithms. Unfortunately, naïve compilation of such specifications leads to creation of many temporary arrays at runtime and, consequently, to poor performance characteristics. This paper elaborates on a new compiler optimization, named WITH-LOOP-SCALARIZATION, which aims at eliminating temporary arrays in the context of nested array operations. It is based on WITH-loops, a versatile array comprehension construct used by the functional array language SAC both for specification as well as for internal representation of array operations. The impact of WITH-LOOP-SCALARIZATION on the runtime performance of compiled SAC code is demonstrated by several experiments involving support for arithmetic on arrays of complex numbers and the application kernel FT from the NAS benchmark suite.
Conference Paper
Full-text available
Bulk Synchronous Parallel ML or BSML is a functional data-parallel language for programming bulk synchronous parallel (BSP) algorithms. The execution time can be estimated and dead-locks and indeterminism are avoided. For large scale applications where parallel processing is helpful and where the total amount of data often exceeds the total main memory available, parallel disk I/O becomes a necessity. We present here a library of I/O features for BSML and its cost model.
Article
Full-text available
Functional programming languages have banned assignment because of its undesirable properties. The reward of this rigorous decision is that functional programming languages are side-effect free. There is another side to the coin: because assignment plays a crucial role in Input/Output (I/O), functional languages have a hard time dealing with I/O. Func - tional pro gramming languages have therefore often been stigmatised as inferior to imper- ative programming languages because they cannot deal with I/O very well. In this paper we show that I/O can be incorporated in a functional programming language without loss of any of the generally accepted advantages of functional programming languages. This discussion is supported by an extensive ac count of the I/O system offered by the lazy, pu- rely functional programming language Clean. Two aspects that are paramount in its I/O sys tem make the approach novel with respect to other approaches. These aspects are the technique of explicit multiple environment passing, and the Event I/O framework to pro - gram Graphical User I/O in a highly struc tured and high-level way. Clean file I/O is as powerful and flex ible as it is in common imperative languages (one can read, write, and seek directly in a file). Clean Event I/O provides pro grammers with a high-level frame - work to specify complex Graphical User I/O. It has been used to write applications such as a window-based text editor, an object based drawing program, a relational database, and a spreadsheet program. These graphical interactive programs are com pletely machine independent, but still obey the look-and-feel of the concrete win dow environment being used. The specifications are completely functional and make extensive use of uniqueness typ ing, higher-order functions, and alge braic data types. Efficient implementations are present on the Macintosh, Sun (X Windows under Open Look), and PC (OS/2).
Article
Full-text available
. Sac (Single Assignment C) is a strict, purely functional programming language primarily designed with numerical applications in mind. Particular emphasis is on efficient support for arrays both in terms of language expressiveness and in terms of runtime performance. Array operations in Sac are based on elementwise specifications using so-called With-loops. These language constructs are also well-suited for concurrent execution on multiprocessor systems. This paper outlines an implicit approach to compile Sac programs for multi-threaded execution on shared memory architectures. Besides the basic compilation scheme, a brief overview of the runtime system is given. Finally, preliminary performance figures demonstrate that this approach is well-suited to achieve almost linear speedups. 1 Introduction Sac (Single Assignment C) is a strict, first-order, purely functional programming language primarily designed with numerical applications in mind. Particular emphasis is on effi...
Conference Paper
We report our experiences of programming in the functional language SaC[1] a numerical method for the KPI (Kadomtsev-Petiviashvili I) equation. KPI describes the propagation of nonlinear waves in a dispersive medium. It is an integro-differential, nonlinear equation with third-order derivatives, and so it presents a noticeable challenge in numerical solution, as well as being an important model for a range of topics in computational physics. The latter include: long internal waves in a density-stratified ocean, ion-acoustic waves in a plasma, acoustic waves on a crystal lattice, and more. Thus our solution of KPI in SaC represents an experience of solving a “real” problem using a single-assignment language and as such provides an insight into the kind of challenges and benefits that arise in using the functional paradigm in computational applications. The paper describes the structure and functionality of the program, discusses the features of functional programming that make it useful for the task in hand, and touches upon performance issues.