Conference PaperPDF Available

ParT: An Asynchronous Parallel Abstraction for Speculative Pipeline Computations

Authors:

Abstract and Figures

The ubiquity of multicore computers has forced programming language designers to rethink how languages express parallelism and concurrency. This has resulted in new language constructs and new combinations or revisions of existing constructs. In this line, we extended the programming languages Encore (actor-based), and Clojure (functional) with an asynchronous parallel abstraction called ParT, a data structure that can dually be seen as a collection of asynchronous values (integrating with futures) or a handle to a parallel computation, plus a collection of combinators for manipulating the data structure. The combinators can express parallel pipelines and speculative parallelism. This paper presents a typed calculus capturing the essence of ParT, abstracting away from details of the Encore and Clojure programming languages. The calculus includes tasks, futures, and combinators similar to those of Orc but implemented in a non-blocking fashion. Furthermore, the calculus strongly mimics how ParT is implemented, and it can serve as the basis for adaptation of ParT into different languages and for further extensions.
Content may be subject to copyright.
EasyChair Preprint
43
ParT: An Asynchronous Parallel Abstraction for
Speculative Pipeline Computations
Kiko Fernandez-Reyes, Dave Clarke and Daniel S. McCain
EasyChair preprints are intended for rapid
dissemination of research results and are
integrated with the rest of EasyChair.
April 5, 2018
ParT: An Asynchronous Parallel Abstraction
for Speculative Pipeline Computations?
Kiko Fernandez-Reyes, Dave Clarke, and Daniel S. McCain
Department of Information Technology
Uppsala University, Uppsala, Sweden
Abstract. The ubiquity of multicore computers has forced program-
ming language designers to rethink how languages express parallelism and
concurrency. This has resulted in new language constructs and new com-
binations or revisions of existing constructs. In this line, we extended the
programming languages Encore (actor-based), and Clojure (functional)
with an asynchronous parallel abstraction called ParT, a data structure
that can dually be seen as a collection of asynchronous values (integrat-
ing with futures) or a handle to a parallel computation, plus a collection
of combinators for manipulating the data structure. The combinators can
express parallel pipelines and speculative parallelism. This paper presents
a typed calculus capturing the essence of ParT, abstracting away from
details of the Encore and Clojure programming languages. The calculus
includes tasks, futures, and combinators similar to those of Orc but im-
plemented in a non-blocking fashion. Furthermore, the calculus strongly
mimics how ParT is implemented, and it can serve as the basis for adap-
tation of ParT into different languages and for further extensions.
1 Introduction
The ubiquity of multicore computers has forced programming language designers
to rethink how languages express parallelism and concurrency. This has resulted
in new language constructs that, for instance, increase the degree of asynchrony
while exploiting parallelism. A promising direction is programming languages
with constructs for tasks and actors, such as Clojure and Scala [8, 16], due to
the lightweight overhead of spawning parallel computations. These languages
offer coarse-grained parallelism at the task and actor level, where futures act
as synchronisation points. However, these languages are lacking in high-level
coordination constructs over these asynchronous computations. For instance, it is
not easy to express dependence on the first result returned via a bunch of futures
and to safely terminate the computations associated with the other futures.
The task of terminating speculative parallelism is quite delicate, as the futures
may have attached parallel computations that depend on other futures, creating
complex dependency patterns that need to be tracked down and terminated.
?Partly funded by the EU project FP7-612985 UpScale: From Inherent Concurrency
to Massive Parallelism through Type-based Optimisations.
To address this need, this paper presents the design and implementation of
ParT, a non-blocking abstraction that asynchronously exploits futures and en-
ables the developer to build complex, data parallel coordination workflows using
high-level constructs. These high-level constructs are derived from the combi-
nators of the orchestration language Orc [11,12]. ParT is formally expressed in
terms of a calculus that, rather than being at a high level of abstraction, strongly
mimics how this asynchronous abstraction is implemented and is general enough
to be applied to programming languages with notions of futures.
The contributions of the paper are as follows: the design of an asynchronous
parallel data abstraction to coordinate complex workflows, including pipeline
and speculative parallelism, and a typed, non-blocking calculus modelling this
abstraction, which integrates futures, tasks and Orc-like combinators, supports
the separation of the realisation of parallelism (via tasks) from its specification,
and offers a novel approach to terminating speculative parallelism.
2 Overview
To set the scene for this paper, we begin with a brief overview to asynchronous
computations with futures and provide an informal description of the ParT ab-
straction and its combinators. A SAT solver example is used as an illustration.
In languages with notions of tasks and active objects [2, 8, 16], asynchronous
computations are created by spawning tasks or calling methods on active objects.
These computations can exploit parallelism by decoupling the execution of the
caller and the callee [7]. The result of a spawn or method call is immediately
a future, a container that will eventually hold the result of the asynchronous
computation. A future that has received a value is said to be fulfilled. Operations
on futures may be blocking, such as getting the result from a future, or may be
asynchronous, such as attaching a callback to a future. This second operation,
called future chaining and represented by f callback, immediately returns
a new future, which will contain the result of applying the callback function
callback to the contents of the original future after it has been fulfilled. A future
can also be thought of as a handle to an asynchronous computation that can be
extended via future chaining or even terminated. This is an useful perspective
that we will further develop in this work. In languages with notions of actors,
such as Clojure and Encore [2], asynchrony is the rule and blocking on futures
suffers a large performance penalty. But creating complex coordination patterns
based on a collection of asynchronous computations without blocking threads
(to maintain the throughput of the system) is no easy task.
To address this need, we have designed an abstraction, called ParT, which
can be thought of as a handle to an ongoing parallel computation, allowing
the parallel computation to be manipulated, extended, and terminated. A ParT
is a functional data structure, represented by type Par t, that can be empty
{} :: Par t, contain a single expression {−} :: tPar t, or futures attached to
computations producing values, using ():: Fut tPar t, or computations
producing ParTs, embedded using ():: Fut (Par t)Par t. Multiple ParTs
1def fut2Par(f: Fut(Maybe a)): Par a
2(f \(m: Maybe a) ->
3match m with Nothing => {}; Just val => {val})
4
5def evaluateFormula(form: Formula, a: Assignment): (Maybe bool, Assignment)
6...
7
8def sat(st: Strategy, fml: Formula, a: Assignment): Par Assignment
9let variable = st.getVariable(fml, a)
10 a1 = a.extendAssignment(variable, true)
11 a2 = a.extendAssignment(variable, false)
12 in
13 ({evaluateFormula(fml, a1)} || {evaluateFormula(fml, a2)}) >>=
14 \(result: (Maybe bool, Assignment)) ->
15 match result with
16 (Nothing, ar) => sat(st, fml, ar);
17 (Just true, ar) => {ar};
18 (Just false, ar) => {};
19
20 def process(sts: [Strategy], fml: Formula): Par Assignment
21 fut2Par << (each(sts) >>= \(s: Strategy) ->
22 (async sat(s, fml, new Assignment())))
Fig. 1: A SAT solver in Encore.
can be combined using the par constructor, k:: Par tPar tPar t. This
constructor does not necessarily create new parallel threads of control, as this
would likely have a negative impact on performance, but rather specifies that
parallelism is available. The scheduler in the ParT implementation can choose
to spawn new tasks as it sees fit — this is modelled in the calculus as a single
rule that nondeterministically spawns a task from a par (rule Red-Schedule).
The combinators can express complex coordination patterns and operate
on them in a non-blocking manner, and safely terminate speculative parallelism
even in the presence of complex workflows. These combinators will be illustrated
using an example, then explained in more detail.
Illustrative example. Consider a portfolio-based SAT solver (Fig. 1), which cre-
ates numerous strategies, of which each finds an assignment of variables to
Boolean values for a given proposition, runs them in parallel, and accepts the
first solution found. Each strategy tries to find a solution by selecting a variable
and creating two instances of the formula, one where the variable is assigned
true, the other where it is assigned false (called splitting) — strategies differ in
the order they select variables for splitting. These new instances can potentially
be solved in parallel.
The example starts in function process (line 20) which receives an array of
strategies and the formula to solve. Strategies do not interact with each other
and can be lifted to a ParT, creating a parallel pipeline (line 21) using the each
and bind (=) combinators. As soon as one strategy finds an assignment, the
remaining computations are terminated via the prune () combinator.
For each strategy, a call to the sat function (line 8) is made in parallel using
a call to async, which in this case returns a value of type Fut (Par Assignment).
Function sat takes three arguments: a strategy, a formula and an assignment
object containing the current mapping from variables to values. This function
uses the strategy object to determine which variable to split next, extends the
assignment with new valuations (lines 9–11), recursively solves the formula (by
again calling sat), and returns an assignment object if successful. The evaluation
of the formula, evaluateFormula returns, firstly, an optional Boolean to indicate
whether evaluation has completed, and if it has completed, whether the formula
is satisfiable, and secondly, the current (partial) variable assignment. The two
calls to evaluateFormula are grouped into a new ParT collection (using || )
and, with the use of the = combinator, a new asynchronous pipeline is created
to either further evaluate the formula by calling sat, to return the assignment
in the case that a formula is satisfiable as a singleton ParT, or {} when the
assignment does not satisfy the formula (lines 14–18).
Finally, returning back to process, the prune combinator () (line 21) is
used to select the first result returned by the recursive calls to sat, if there is
one. This result is converted from an option type to an empty or singleton ParT
collection (again asynchronously), which can then be used in a larger parallel
operation, if so desired. The prune combinator will begin poisoning and safely
terminating the no longer needed parallel computations, which in this case will
be an ongoing parallel pipeline of calls to sat and evaluateFormula.
ParT Combinators. The combinators are now described in detail. The combi-
nators manipulate ParT collections and were derived from Orc [11,12], although
in our setting, they are typed and redefined to be completely asynchronous,
never blocking the thread. Primitive combinators express coordination patterns
such as pipeline and speculative parallelism, and more complex patterns can be
expressed based on these primitives.
Pipeline parallelism is expressed in ParT with the sequence and bind combi-
nators. The sequence combinator, :: Par t(tt0)Par t0, takes a ParT
collection and applies the function to each element in the collection, potentially
in parallel, returning a new ParT collection. The bind combinator (derived from
other combinators) = :: Par t(tPar t0)Par t0is similar to the
sequence combinator, except that the function returns a ParT collection and the
resulting nested ParT collection is flattened. (Par is a monad!1) In the presence
of futures inside a ParT collection, these combinators use the future chaining
operation to create independent and asynchronous pipelines of work.
Speculative parallelism is realised by the peek combinator, peek :: Par t
Fut (Maybe t), which sets up a speculative computation, asynchronously waits
for a single result to be produced, and then safely terminates the speculative
work. To terminate speculative work the ParT abstraction poison these specu-
1The monad operations on Par are essentially the same as for lists but parallelised.
lative computations, which may have long parallel pipelines to which the poison
spreads recursively, producing a pandemic infection among futures, tasks and
pipelines of computations. Afterwards, poisoned computations that are no longer
needed can safely be terminated. Metaphorically, this is analogous to a tracing
garbage collector.
The value produced by peek is a future to an option type. The option type
is used to capture whether the parallel collection was empty or not. The empty
collection {} results in Nothing, and a non-empty collection results in a Just v,
where vis the first value produced. The conversion to option type is required
because ParTs cannot be tested for emptiness without blocking. The peek com-
binator is an internal combinator, i.e., it is not available to the developer and is
used by the prune combinator (explained below).
Built on top of peek is the prune combinator, :: (Fut (Maybe t)
Par t0)Par tPar t0, which applies a function in parallel to the future
produced by peek, and returns a parallel computation.
Powerful combinators can be derived from the ones mentioned above. An
example of a derived combinator, which is a primitive in Orc, is the otherwise
combinator, >< :: Par tPar tPar t(derivation is shown in Section 3.1).
Expression e1>< e2results in e1unless it is an empty ParT, in which case it
results in e2.
Other ParT combinators are available. For instance, each :: [t]Par tand
extract :: Par t[t] convert between sequential (arrays) and ParTs. The
latter potentially requires a lot of synchronisation, as all the values in the collec-
tion need to be realised. Both have been omitted from the formalism, because
neither presents any real technical challenge — the key properties of the formal-
ism, namely, deadlock-freedom, type preservation and task safety (Section 3.5),
still hold with these extensions in place.
3 A Typed ParT Calculus
This section presents the operational semantics and type system of a task-based
language containing the ParT abstraction. The formal model is roughly based
on the Encore formal semantics [2,5], with many irrelevant details omitted.
3.1 Syntax
The core language (Fig. 2) contains expressions eand values v. Values include
constants c, variables, futures f, lambda abstractions, and ParT collections of
values. Expressions include values v, function application (e e), task creation,
future chaining, and parallel combinators. Tasks are created via the async ex-
pression, which returns a future. The parallel combinators are those covered in
Section 2 ( || ,,peek and ), plus some derived combinators, together with
the low-level combinator join that flattens nested ParT collections. Recall that
peek is used under-the-hood in the implementation of . Status πcontrols how
peek behaves: when πis and the result in peek is an empty ParT collection,
e::= v|e e |async e|e e| {e} | e|| e
|ee|ee|e|e|join e|peekπe
v::= c|f|x|λx.e | {} | {v} | f|f|v|| v
π::= | 
Fig. 2: Syntax of the language.
the value is discarded and not written to the corresponding future. This status
helps to ensure that precisely one speculative computation writes into the fu-
ture and that a speculative computation fails to produce a value only when all
relevant tasks fail to produce a value.
ParT collections are monoids, meaning that the composition operation e|| e
is associative and has {} as its unit. As such, ParT collections are sequences,
though no operations such as getting the first element are available to access
them sequentially. As an alternative, adding in commutativity of || would give
multiset semantics to the ParT collections — the operational semantics is oth-
erwise unchanged. Two for one!
A number of the constructs are defined by translation into other constructs.
let x=ein e0b= (λx.e0)e
e1>< e2
b=let x=e1in
(λy.(y (λz.match zwith Nothing e2;x)))x
e1=e2
b=join (e1e2)
maybe2par b=λx.match xwith Nothing → {};Just y→ {y}
The encoding of let is standard. In e1>< e2, pruning is used to test
the emptyness of e1. If it is not empty, the result of e1is returned, otherwise
the result is e2. The definition of = is a standard definition of monadic bind
in terms of map () and join. We assume for convenience a Maybe type and
pattern matching on it.
3.2 Configurations
Running programs are represented by configurations (Fig. 3). Configurations can
refer to the global system or a partial view of the system. A global configura-
tion {config}captures the complete global state, e.g., {(futf) (taskfe)}shows
a global system containing a single task running expression e. Local configura-
tions, written as config, show a partial view of the state of the program. These are
multisets of tasks, futures, poison and future chains. The empty configuration is
represented by . Future configurations, (futf) and (futfv), represent unful-
filled and fulfilled futures, respectively. Poison is the configuration (poison f)
that will eventually terminate tasks and chains writing to future fand their
dependencies. A running task (taskα
fe) has a body eand will write its result to
gconfig ::= {config}
config ::= |(futf)|(futfv)|(poison f)|(taskα
fe)|(chainα
fg e)|config config
α::= |t
Fig. 3: Runtime configurations.
future f. The chain configuration (chainα
fg e) depends on future gthat, when
fulfilled, will then run expression eon the value stored in g, and write its value
into future f. Concatenation of configurations, config config0, is associative and
commutative with the empty configuration as its unit (Fig. 12).
Tasks and chains have a flag αthat indicates the poisoned state of the com-
putation. Whitespace ‘ ’ indicates that the computation has not been poisoned,
and tindicates that the computation has been poisoned and can be safely ter-
minated, if it is not needed (see Rule Red-Terminate of Fig. 10).
The initial configuration to evaluate expression eis {(taskfe) (futf)}, where
the value written into future fis the result of the expression.
3.3 Reduction Rules
The operational semantics is based on a small-step, reduction-context based rules
for evaluation within tasks, and parallel reduction rules for evaluation across
configurations. Evaluation is captured by expression-level evaluation context E
containing a hole that marks where the next step of the reduction will occur
(Fig. 4). Plugging an expression einto an evaluation context E, denoted E[e],
represents both the subexpression to be evaluated next and the result of reducing
that subexpression in context, in the standard fashion [21].
E::= • | E e |v E |E e|v E| {E} | E|| e|v|| E|Ee|vE
|Ee|E|E|join E|peekπE
Fig. 4: Expression-level evaluation contexts.
Reduction of configurations is denoted config config0, which states that
config reduces in a single step to config0.
Core Expressions. The core reduction rules (Fig. 5) for functions, tasks and
futures are well-known or derived from earlier work [5]. Together, the rules Red-
Chain and Red-ChainV describe how future chaining works, initially attaching
a closure to a future (via the chain configuration), then evaluating the closure
in a new task after the future has been fulfilled.
(Red-β-Red)
(taskα
gE[(λx.e)v]) (taskα
gE[e[v/x]])
(Red-Async)
fresh f
(taskα
gE[async e]) (futf) (taskα
fe) (taskα
gE[f])
(Red-FutV)
(taskα
fv) (futf)(futfv)
(Red-Chain)
fresh h
(taskα
gE[f v]) (futh) (chainα
hf v) (taskα
gE[h])
(Red-ChainV)
(chainα
gf e) (futfv)(taskα
g(e v)) (futfv)
Fig. 5: Core reduction rules.
Sequencing. The sequencing combinator creates pipeline parallelism. Its se-
mantics are defined inductively on the structure of ParT collections (Fig. 6). The
second argument must be a function (tested in function application, but guar-
anteed by the type system). In Red-SeqS, sequencing an empty ParT results
in another empty ParT. A ParT with a value applies the function immediately
(Red-SeqV). A lifted future is asynchronously accessed by chaining the func-
tion onto it (Red-SeqF). Rule Red-SeqP recursively applies vto the two
sub-collections. A future whose content is a ParT collection chains a recursive
call to vonto the future and lifts the result back into a ParT collection
(Red-SeqFP).
(Red-SeqS)
(taskα
gE[{}  v]) (taskα
gE[{}])
(Red-SeqV)
(taskα
gE[{v}  v0]) (taskα
gE[{v0v}])
(Red-SeqF)
(taskα
gE[fv]) (taskα
gE[(f v)])
(Red-SeqFP)
(taskα
gE[fv]) (taskα
gE[(f (λx.x v))])
(Red-SeqP)
(taskα
gE[(v1|| v2)v]) (taskα
gE[(v1v)|| (v2v)])
Fig. 6: Reduction rules for the sequence combinator.
Join. The join combinator flattens nested ParT collections of type Par (Par t)
(Fig. 7). Empty collections flatten to empty collections (Red-JoinS). Rule Red-
JoinV extracts the singleton value from a collection. A lifted future that con-
tains a ParT (type Fut (Par t)) is simply lifted to a ParT collection (Red-
JoinF). In Red-JoinFP, a future containing a nested ParT collection (type
Fut (Par (Par t))), chains a call to join to flatten the inner structure. Rule
Red-JoinP applies the join combinator recursively to the values in the ParT
collection.
Prune and Peek. Pruning is the most complicated part of the calculus, though
most of the work is done using the peek combinator (Fig. 8). Firstly, rule Red-
(Red-JoinS)
(taskα
gE[join {}]) (taskα
gE[{}])
(Red-JoinV)
(taskα
gE[join {v}]) (taskα
gE[v])
(Red-JoinF)
(taskα
gE[join f]) (taskα
gE[f])
(Red-JoinFP)
(taskα
gE[join f]) (taskα
gE[(f (λx.join x))])
(Red-JoinP)
(taskα
gE[join (v1|| v2)]) (taskα
gE[(join v1)|| (join v2)])
Fig. 7: Reduction rules for the join combinator.
(Red-Prune)
fresh f
(taskα
gE[vv0]) (futf) (taskα
f(peek v0)) (taskα
gE[v f])
(Red-PeekS)
(taskα
gE[peek{}])
(Red-PeekS)
(taskα
gE[peek {}]) (futg)(futgNothing) (poison g)
(Red-PeekV)
(taskα
gE[peekπ({v} || v0) ]) (futg)(futg(Just v)) (poison g)[
hdeps(v0)
(poison h)
(Red-PeekF)
(taskα
gE[peekπ(f|| v)]) (chainα
gf(λx.peekπ{x})) (taskα
g(peekv))
(Red-PeekFP)
fresh h
(taskα
gE[peekπ(f|| v)])
(chainα
gf(λx.peekπ(x|| (h maybe2par))))
(futh) (taskα
h(peek v)) (chainα
gh(λx.peek(maybe2par x)))
Fig. 8: Reduction rules for pruning. For singleton collections are handled via
equality v=v|| {}.
Prune spawns a new task that will peek the collection v0, and passes this new
task’s future to the function v. The essence of the peek rules is to set up a bunch
of computations that compete to write into a single future, with the strict re-
quirement that Nothing is written only when all competing tasks cannot produce
a value—that is, when the ParT being peeked is empty. This is challenging due
to the lifted future ParTs (type Fut (Par t)) within a collection, because such a
future may be empty, but this fact cannot easily be seen in a non-blocking way.
Another challenge is to avoid introducing sequential dependencies between enti-
ties that can potentially run in parallel, to avoid, for instance, a non-terminating
computation blocking one that will produce a result.
A task that produces a ParT containing a value (rule Red-PeekV) writes the
value, wrapped in an option type, into the future and poisons all computations
writing into that future, recursively poisoning direct dependencies. The status
on peek prevents certain peek invocations from writing a final empty result, as
in rule Red-PeekS. Contrast with Red-PeekS, in which a task resulting in
an empty ParT writes Nothing into the future — in this case it is guaranteed
that no other peek exists writing to the future.
A lifted future fis guaranteed to produce a result, though it may not pro-
duce it in a timely fashion. This case is handled (rule Red-PeekF) by chaining
a function onto it that will ultimately write into future gwhen the value is pro-
duced, if it wins the race. Otherwise, the result of peeking into vis written into
g, unless the value produced is {} (which is controlled by ).
A lifted future to a ParT is not necessarily guaranteed to produce a result,
and neither is any ParT that runs in parallel with it. Thus, extra care needs
to be taken to ensure that Nothing is written if and only if both are actually
empty. This is handled in rule Red-PeekFP. Firstly, a function is chained onto
the lifted future to get access to the eventual ParT collection. This is combined
with future hthat is used to peek into vvia a new task.
In all cases, computations propagate the poison state αto new configurations.
Scheduling. Rule Red-Schedule (Fig. 9) models the non-deterministic schedul-
ing of parallelism within a task, converting some of the parallelism latent in a
ParT collection into a new task. Apart from this rule, expressions within tasks
are evaluated sequentially.
(Red-Schedule)
fresh f
(taskα
gE[e1|| e2]) (taskα
gE[e1|| f]) (futf) (taskα
fe2)
Fig. 9: Spawning of tasks inside a ParT.
Poisoning and Termination. The rules for poisoning and termination (Fig. 10)
are based on a poisoned carrier configuration defined as (P Cα
fe) ::= (taskα
fe)|
(chainα
fg e); these rules rely on the definition of when a future is needed (Defini-
tion 2), which in turn is defined in terms of the futures on which a task depends
to produce a value (Definition 1).
Definition 1. The dependencies of an expression e, deps(e), is the set of the
futures upon which the computation of edepends in order to produce a value:
deps(f) = {f}
deps(c) = deps({}) = deps(x) =
deps({e}) = deps(λx.e) = deps(async e) = deps(e) = deps(e) =
deps(peekπe) = deps(join e) = deps(e)
deps(e|| e0) = deps(e e0) = deps(ee0) = deps(e=e0) =
deps(e>< e0) = deps(e e0) = deps(ee0) = deps(e)deps(e0)
deps((taskα
fe)) = deps(e)
deps((chainα
fg e)) = {g} ∪ deps(e).
Definition 2. A future fis needed in configuration config, denoted config `
needed(f), whenever some other element of the configuration depends on it:
config `needed(f)iff (P Cα
ge)config fdeps((PC α
ge)) (futf)config.
(Red-Poison)
(poison f) (P Cfe)(poison f) (P Ct
fe)S
gdeps((PCfe))
(poison g)
(Red-Terminate)
¬(config `needed(f))
{(P Ct
fe)config}→{config}
Fig. 10: Poisoning reduction rules.
Configurations go through a two step process before being terminated. In
the first step (rule Red-Poison) the poisoning of future fpoisons any task
or chain writing to f, marks it with t, and the poison is transmitted to the
direct dependencies of the expression ein the task or chain. In the second step
(Red-Terminate), a poisoned configuration is terminated when there is no
other configuration relying on its result — that is, a poisoned task or chain is
terminated if there is no expression around to keep it alive. This rule is global,
referring to the entire configuration. Termination can be implemented using
tracing garbage collection, though in the semantics a more global specification
of dependency is used.
An example (Fig. 11) illustrates how poisoning and termination work to pre-
vent a task that is still needed from being terminated. Initially, there is a bunch of
tasks (squares) and futures (circles) (Fig. 11A), where one of the tasks completes
and writes a value to future f. This causes all of the other tasks writing to fto
be poisoned, via Rule Red-PeekV (Fig. 11B). After application of rule Red-
Poison, the dependent tasks and futures are recursively poisoned (Fig. 11C).
Finally, the application of rule Red-Terminate terminates tasks that are not
needed (Fig. 11D). Task e1is not terminated, as future gis required by the task
computing e g.
Configurations. The concatenation operation on configurations is commutative
and associative and has the empty configuration as its unit (Fig. 12). We assume
that these equivalences, along with the monoid axioms for k, can be applied at
any time during reduction.
The reduction rules for configurations (Fig. 13) have the individual configu-
ration reduction rules at their heart, along with standard rules for parallel evalu-
ation of non-conflicting sub-configurations, as is standard in rewriting logic [14].
A
RED-PEEKV
{v}
e g
g
e1
je2
e3
f
writes
h
he5 || e6
{v}
C
e g
g
e1
f
terminates
je2
e3
RED-TERMINATE
h
he5 || e6
terminates
terminates
{v}
B
e g
g
e1
je2
e3
f
poison
poison
RED-POISON
h
he5 || e6
poison
poison
{v}
D
e g
g
e1
f
j
h
Fig. 11: Safely poisoning and terminating a configuration. The letter in the
top right corner indicates the order. Tasks are represented by squares, contain
a body and have an arrow to the future they write to. Futures (circles) have
dotted arrows to tasks that use them. Grey represents poisoned configurations.
Terminated configurations are removed.
config config config config config0config0config
config (config0config00)(config config0)config00 config config0
{config}≡{config0}
Fig. 12: Configuration equivalence modulo associativity and commutativity.
3.4 Type System
The type system (Fig. 14) assigns the following types to terms:
τ::= K|Fut τ|Par τ|Maybe τ|ττ
where Krepresents the basic types, Fut τis the type of a future containing
a value of type τ,Par τis the type of a ParT collection of type τ,Maybe τ
represents an option type, and ττrepresents function types. We also let ρ
range over types.
The key judgement in the type system is Γ`ρe:τwhich asserts that, in
typing context Γ, the expression eis a well-formed term with type τ, where
ρis the expected return type of the task in which this expression appears —
ρis required to type peek. The typing context contains the types of both free
variables and futures.
Rule TS-Async gives the type for task creation and rule TS-Chain shows
how to operate on such values — future chaining has the type of map for the
config config0
config config00 config0config00
config0config0
0config1config0
1
config0config1config0
0config0
1
config0config00 config0
0config00 config1config00 config0
1config00
config0config1config00 config0
0config0
1config00
config config0
{config}→{config0}
Fig. 13: Configuration reduction rules
(TS-Const)
cis a constant of type τ
Γ`ρc:τ
(TS-Fut)
f:Fut τΓ
Γ`ρf:Fut τ
(TS-X)
x:τΓ
Γ`ρx:τ
(TS-App)
Γ`ρe1:τ0τ Γ `ρe2:τ0
Γ`ρe1e2:τ
(TS-Fun)
Γ, x :τ`ρe:τ0
Γ`ρλx.e :ττ0
(TS-Async)
Γ`ρe:τ
Γ`ρasync e:Fut τ
(TS-Chain)
Γ`ρe1:Fut τ0Γ`ρe2:τ0τ
Γ`ρe1 e2:Fut τ
(TS-EmptyPar)
Γ`ρ{} :Par τ
(TS-SingletonPar)
Γ`ρe:τ
Γ`ρ{e}:Par τ
(TS-LiftF)
Γ`ρe:Fut τ
Γ`ρe:Par τ
(TS-LiftFP)
Γ`ρe:Fut (Par τ)
Γ`ρe:Par τ
(TS-Par)
Γ`ρe1:Par τ Γ `ρe2:Par τ
Γ`ρe1|| e2:Par τ
(TS-Sequence)
Γ`ρe1:Par τ0Γ`ρe2:τ0τ
Γ`ρe1e2:Par τ
(TS-Join)
Γ`ρe:Par (Par τ)
Γ`ρjoin e:Par τ
(TS-Otherwise)
Γ`ρe1:Par τ Γ `ρe2:Par τ
Γ`ρe1>< e2:Par τ
(TS-Peek)
Γ`Maybe ρe:Par ρ
Γ`Maybe ρpeekπe:τ
(TS-Prune)
Γ`ρe1:Fut (Maybe τ)Par τ0Γ`ρe2:Par τ
Γ`ρe1e2:Par τ0
(TS-Bind)
Γ`ρe1:Par τ0Γ`ρe2:τ0Par τ
Γ`ρe1=e2:Par τ
Fig. 14: Expression Typing.
Fut constructor. Rules TS-EmptyPar,TS-SingletonPar,TS-LiftF,TS-
LiftFP, and TS-Par give the typings for constructing ParT collections. Rule
TS-Sequence implies that sequencing has the type of map for the Par con-
structor. TS-Bind and TS-Join give = and join the types of the monadic
bind and join operators for the Par constructor, respectively. Rule TS-Prune
captures the communication between the two parameters via the future passed
as an argument to the first parameter — the future will contain the first value
of the second parameter if there is one, captured by the Maybe type. Rule TS-
Peek captures the conversion of the singleton or empty argument of peek from
Par ρto Maybe ρ, the expected result type of the surrounding task. Because
peek terminates the task and does not return locally, its return type can be any
type.
Well-formed configurations (Fig. 15) are expressed by the judgement Γ`
config ok, where Γis the assumptions about the future types in config. Rules
(T-Fut)
fdom(Γ)
Γ`(futf)ok
(T-FutV)
f:Fut τΓ Γ `τv:τ
Γ`(futfv)ok
(T-Poison)
f:Fut τΓ
Γ`(poison f)ok
(T-Task)
f:Fut τΓ Γ `τe:τ
Γ`(taskα
fe)ok
(T-Chain)
f1:Fut τ1Γf2:Fut τ2Γ Γ `τ2e:τ1τ2
Γ`(chainα
f2f1e)ok
(T-Config)
Γ`config1ok Γ`config2ok futset(config1)futset (config2) =
Γ`config1config2ok
(T-GConfig)
Γ`config ok dom(Γ) = futset(config)TaskSafe(config)AcyclicDep(config)
Γ` {config}ok
Fig. 15: Configuration Typing.
T-Task and T-Chain propagate the eventual expected result type on the turn-
style `when typing the enclosed expression. Rule T-Config depends upon the
following definition, a function that collects all futures defined in a configuration:
Definition 3. Define futset(config)as:
futset((futf)) = futset((futfv)) = {f}
futset((config1config2)) = futset(config1)futset(config2)
futset( ) = .
Rule T-GConfig defines the well-formedness of global configurations, judge-
ment Γ` {config}ok. This rule depends on a number of definitions that capture
properties on futures and tasks and on the dependency between futures. The in-
variance of these properties is ultimately used to proof type soundness and other
safety properties of the system.
Definition 4. Define the following functions for collecting the different kinds of
tasks and chains of a configuration:
regularf(config) = {(taskfe)config |e6=peekπe0}
∪ {(chainfg e)config |e6=λ . peekπe0}
peekerf(config) = {(taskf(peek e)) config}
∪ {(chainfg(λ . peek e)) config}
peeker
f(config) = {(taskf(peeke)) config}
∪ {((chainfg(λ . peeke)) config}
Tasks with no peek expression are called regular tasks, while peeker tasks
have the peek expression — there are both - and non--peeker tasks. These
functions can be used to partition the tasks and chains in a configuration into
these three kinds of tasks and chains. These definitions consider peek expressions
only at the top level of a task, although the syntax allows them to be anywhere.
Based on the reduction rules, one can prove that peek only appears at the top
level of a task or chain, so no task or chain is excluded by these definitions.
Definition 5. Define predicate TaskSafe(config)as follows:
TaskSafe(config)iff for all ffutset(config)
|regularf(config)peekerf(config)| ≤ 1
(futf)config (poison f)/config
|regularf(config)peekerf(config)|= 1
∧ |regularf(config)|= 1 peekerf(config)peeker
f(config) =
(taskα
f(peek {})) config (futf)config peeker
f(config) =
Predicate TaskSafe(config) (Definition 5) describes the structure of the con-
figuration config. It states that:
there is at most one regular or non--peeker task per future;
if a future has not yet been fulfilled and it is not poisoned, then there exists
exactly one regular task or non--peeker task that fulfils it;
regular tasks and peeker tasks do not write to the same futures; and
if a peeker task that is about to fulfil a future with Nothing, then the future
is unfulfilled and no -peeker task fulfilling the same future exists.
The following definition establishes dependencies between futures. Predicate
config `fCgholds for all future gwhose eventual value could influence the
result stored in future f.
Definition 6. Define the predicate config `fCg as the least transitive relation
satisfying the following rules:
(taskα
fe)config g deps(e)
config `fCg
(chainα
fh e)config g deps(e)∪ {h}
config `fCg
(futfv)config g deps(v)
config `fCg
Definition 7. Predicate AcyclicDep(config)holds iff relation Cis acyclic,
where Cis defined for config in Definition 6.
Rule T-GConfig for well-formed global configurations requires that pre-
cisely the futures that appear in the typing environment Γappear in the config-
uration, that the configuration is well-formed, and that it satisfies the properties
TaskSafe and AcyclicDep. By including these properties as a part of the well-
formedness rule for global configurations, type preservation (Lemma 1) makes
these invariants. These invariants on the structure of tasks and the dependency
relation together ensure that well-typed configurations are deadlock-free, as we
explore next.
3.5 Formal Properties
The calculus is sound and deadlock-free. These results extend previous work [15]
to address the pruning combinator.
Lemma 1 (Type Preservation). If Γ` {config}ok and {config}→{config0},
then there exists a Γ0such that Γ0Γand Γ0` {config0}ok.
Proof. By induction on derivation {config}→{config0}. In particular, the in-
variance of AcyclicDep is shown by considering the changes to the dependen-
cies caused by each reduction rule. The only place where new dependencies are
introduced is when new futures are created. Adding a future to the dependency
relation cannot introduce cycles. ut
The following lemma states that the notion of needed, which determines
whether or not to garbage collect a poisoned task or chain, is anti-monotonic,
meaning that after a future is no longer needed according to the definitions, it
does not subsequently become needed.
Lemma 2 (Safe Task Kill). If Γ` {config}ok and {config}→{config0}, then
¬(config `needed(f)) implies ¬(config0`needed(f)).
Proof. A future is initially created in a configuration where it is needed. If ever
a future disappears from deps(e), if can never reappear. ut
This lemma rules out the situation where a task is poisoned and garbage
collected, but is subsequently needed. For instance, the application of rule Red-
Terminate in Fig. 11C kills tasks e2,e3,e5and e6(shown in Fig 11D). If the
future into which these tasks were going to write is needed afterwards, there
would be a deadlock as a new task could chain on that future but never be
fulfilled.
Definition 8 (Terminal Configuration). A global configuration {config}is
terminal iff every element of config has one of the following shapes: (futf),
(futfv)or (poison f).
Lemma 3 (Deadlock-Freedom/Progress). If Γ` {config}ok, then config is
a terminal configuration, or there exists a config0such that {config}→{config0}.
Proof. By induction on a derivation of {config} → {config0}, relying on the
invariance of AcyclicDep and Lemma 2. ut
Deadlock-freedom guarantees that some reduction rule can be applied to a
well-typed, non terminal, global configuration — this is essentially the progress
property required to prove type safety. It implies further that there are no local
deadlocks, such as a deadlocked configuration like (chainfg e) (chaingf e0).
Such a configuration fails to satisfy the AcyclicDep invariant, thus cannot
exist. If mutable state is added to the calculus, deadlock-freedom is lost.
Implementations. There are two prototypes of the ParT abstraction. In the
first prototype,2ParT has been written as an extension to the Encore compiler
(written in Haskell) and runtime (written in C) but, it can be implemented in
well-established languages with notions of tasks and futures. This prototype in-
tegrates futures produced by tasks and active objects with the ParT abstraction.
The other prototype has been written in Clojure,3which is not statically typed.
Both prototypes follow the semantics to guide the implementation. In practice,
this means that the semantic rules are written in such a way that they can be
easily mimicked in a library or in a language runtime.
4 Related Work
Our combinators have been adapted from those of the Orc [11, 12] programming
language. In ParT, these combinators are completely asynchronous and are inte-
grated with futures. ParTs are first class citizens and can be nested Par (Par t),
neither of which is possible in Orc, which sits on top of the expression being
coordinated and a flat collection of values.
Meseguer et al. [1] used rewriting logic semantics and Maude to provide a
distributed implementation of Orc. Their focus on the semantic model allows
them to model check Orc programs. In this paper, our semantics is more fine-
grained, and guides the implementation in a multicore setting.
ParT uses a monad to encapsulate asynchronous computations, which is not
a new idea [3,13,20]. For instance, F# expresses asynchronous workflows using a
continuation monad [20] but cannot create more parallelism within the monad,
making the model better suited for event-based programming. In contrast, our
approach can spawn parallel computations and include them within ParTs.
Other work implements Orc combinators in terms of a monad within the pure
functional language Haskell [3,13]. One of these approaches [3] relies on threads
and channels and implements the prune combinator using sequential compo-
sition, losing potential parallelism. The other approach [13] uses Haskell threads
and continuations to model parallel computations and re-designs the prune
combinator in terms of a cut combinator thats sparks off parallel computa-
tions, waits until there is a value available and terminates, in bulk, the remain-
ing computations. In contrast, the ParT abstraction relies on more lightweight
tasks instead of threads, has fully asynchronous combinators, which maintain
the throughput of the system, and terminates speculative work by recursively
poisoning dependencies and terminating computations that are not needed.
An approach to increase parallelism is to create parallel versions of existing
collections. For instance, Haskell [10] adds parallel operations to its collections,
and the Scala parallel collections [18] adds new methods to their collection, par
and seq, that return a parallel and a sequential version of the collection. However
these approaches cannot coordinate complex workflows, which is possible with
the ParT abstraction.
2Encore ParT prototype: http://52.50.101.143/kompile/encore/
3Clojure ParT prototype: https://github.com/kikofernandez/ParT
Recent approaches to creating pipeline parallelism are the Flowpool [19] and
FlumeJava [4] abstractions. In the former, functions are attached to Flowpool
and, with the foreach combinator, the attached functions are applied to items
asynchronously added to the Flowpool thereby creating parallel pipelines of com-
putations. The latter, FlumeJava, is a library extending the MapReduce frame-
work; it provides high-level constructs to create efficient data-parallel pipelines
of MapReduce jobs, via an optimisation phase. The ParT abstraction can create
data-parallel pipelines with the sequence and bind = combinators (at the
moment there is no optimisation phase) and further can terminate speculative
work.
Existing approaches to safely terminating speculative parallelism [6,9,17] did
not integrate well with the ParT abstraction. For instance, the Cilk program-
ming language provides the abort keyword to terminate all speculative work
generated by a procedure [6]. The termination does not happen immediately, in-
stead, computations are marked as not-runnable; already running computations
would get marked as non-runnable but do not stop execution until their work
is finished. In other approaches, the developer specifies termination checkpoints
at which a task may be terminated [9, 17]. This solves the previous problem
and improves responsiveness but, adds an extra overhead (for the checking) and
puts the responsibility on the developer, who specifies the location of the check-
points. In our design, the developer does not need to specify these checkpoints
and speculative work is terminated as soon as there are no dependencies. No
other approach considers that the results of tasks may be needed elsewhere.
5 Conclusion and Future Work
This paper presented the ParT asynchronous, parallel collection abstraction,
and a collection of combinators that operate over it. ParT was formalised as a
typed calculus of tasks, futures and Orc-like combinators. A primary character-
istic of the calculus is that it captures the non-blocking implementation of the
combinators, including an algorithm for pruning that tracks down dependencies
and is safe with respect to shared futures. The ParT abstraction has prototypes
in the Encore (statically typed) and Clojure (dynamically typed) programming
languages.
Currently, the calculus does not support side-effects. These are challenging to
deal with, due to potential race conditions and terminated computations leav-
ing objects in an inconsistent state. We expect that Encore’s capability type
system [2] can be used to avoid data races, and a run-time, transactional mecha-
nism can deal with the inconsistent state. At the start of the paper we mentioned
that ParT was integrated into an actor-based language, but the formalism in-
cluded no actors. This work abstracted away the actors, replacing them by tasks
and futures—message sends in the Encore programming language return results
via futures—which were crucial for tying together the asynchronous computa-
tions underlying a ParT. Actors can easily be re-added as soon as the issues of
shared mutable state have been addressed. The distribution aspect of actors has
not yet been considered in Encore or in the ParT abstraction. This would be
an interesting topic for future work. Beyond these extensions, we also plan to
extend the range of combinators supporting the ParT abstraction.
References
1. Musab AlTurki and Jos´e Meseguer. Dist-Orc: A rewriting-based distributed im-
plementation of Orc with formal analysis. In Peter Csaba ¨
Olveczky, editor, Pro-
ceedings First International Workshop on Rewriting Techniques for Real-Time Sys-
tems, RTRTS 2010, Longyearbyen, Norway, April 6-9, 2010., volume 36 of EPTCS,
pages 26–45, 2010.
2. Stephan Brandauer, Elias Castegren, Dave Clarke, Kiko Fernandez-Reyes,
Einar Broch Johnsen, Ka I. Pun, Silvia Lizeth Tapia Tarifa, Tobias Wrigstad,
and Albert Mingkun Yang. Parallel objects for multicores: A glimpse at the paral-
lel language Encore. In Marco Bernardo and Einar Broch Johnsen, editors, Formal
Methods for Multicore Programming - 15th International School on Formal Methods
for the Design of Computer, Communication, and Software Systems, SFM 2015,
Bertinoro, Italy, June 15-19, 2015, Advanced Lectures, volume 9104 of Lecture
Notes in Computer Science, pages 1–56. Springer, 2015.
3. Marco Devesas Campos and Lu´ıs Soares Barbosa. Implementation of an orchestra-
tion language as a Haskell domain specific language. Electr. Notes Theor. Comput.
Sci., 255:45–64, 2009.
4. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R.
Henry, Robert Bradshaw, and Nathan Weizenbaum. Flumejava: Easy, efficient
data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference
on Programming Language Design and Implementation, PLDI ’10, pages 363–375,
New York, NY, USA, 2010. ACM.
5. Dave Clarke and Tobias Wrigstad. Vats: A safe, reactive storage abstraction.
In Erika ´
Abrah´am, Marcello M. Bonsangue, and Einar Broch Johnsen, editors,
Theory and Practice of Formal Methods – Essays Dedicated to Frank de Boer on
the Occasion of His 60th Birthday, volume 9660 of Lecture Notes in Computer
Science, pages 140–154. Springer, 2016.
6. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of
the Cilk-5 multithreaded language. In Jack W. Davidson, Keith D. Cooper, and
A. Michael Berman, editors, Proceedings of the ACM SIGPLAN ’98 Conference on
Programming Language Design and Implementation (PLDI), Montreal, Canada,
June 17-19, 1998, pages 212–223. ACM, 1998.
7. Robert H. Halstead, Jr. Multilisp: A language for concurrent symbolic computa-
tion. ACM Trans. Program. Lang. Syst., 7(4):501–538, October 1985.
8. Rich Hickey. The Clojure programming language. In Johan Brichau, editor, Pro-
ceedings of the 2008 Symposium on Dynamic Languages, DLS 2008, July 8, 2008,
Paphos, Cyprus, page 1. ACM, 2008.
9. Shams Imam and Vivek Sarkar. The Eureka programming model for speculative
task parallelism. In John Tang Boyland, editor, 29th European Conference on
Object-Oriented Programming, ECOOP 2015, July 5-10, 2015, Prague, Czech Re-
public, volume 37 of LIPIcs, pages 421–444. Schloss Dagstuhl - Leibniz-Zentrum
fuer Informatik, 2015.
10. Simon L. Peyton Jones. Harnessing the multicores: Nested data parallelism in
Haskell. In G. Ramalingam, editor, Programming Languages and Systems, 6th
Asian Symposium, APLAS 2008, Bangalore, India, December 9-11, 2008. Proceed-
ings, volume 5356 of Lecture Notes in Computer Science, page 138. Springer, 2008.
11. David Kitchin, William R. Cook, and Jayadev Misra. A language for task or-
chestration and its semantic properties. In Proceedings of the 17th International
Conference on Concurrency Theory, CONCUR’06, pages 477–491, Berlin, Heidel-
berg, 2006. Springer-Verlag.
12. David Kitchin, Adrian Quark, William Cook, and Jayadev Misra. The Orc pro-
gramming language. In Proceedings of the Joint 11th IFIP WG 6.1 International
Conference FMOODS ’09 and 29th IFIP WG 6.1 International Conference FORTE
’09 on Formal Techniques for Distributed Systems, FMOODS ’09/FORTE ’09,
pages 1–25, Berlin, Heidelberg, 2009. Springer-Verlag.
13. John Launchbury and Trevor Elliott. Concurrent orchestration in Haskell. In
Jeremy Gibbons, editor, Proceedings of the 3rd ACM SIGPLAN Symposium on
Haskell, Haskell 2010, Baltimore, MD, USA, 30 September 2010, pages 79–90.
ACM, 2010.
14. Narciso Mart´ı-Oliet and Jos´e Meseguer. Rewriting logic: roadmap and bibliogra-
phy. Theor. Comput. Sci., 285(2):121–154, 2002.
15. Daniel McCain. Parallel combinators for the Encore programming language. Mas-
ter’s thesis, Uppsala University, 2016.
16. Martin Odersky, Lex Spoon, and Bill Venners. Programming in Scala: A Compre-
hensive Step-by-step Guide. Artima Incorporation, USA, 1st edition, 2008.
17. Tim Peierls, Brian Goetz, Joshua Bloch, Joseph Bowbeer, Doug Lea, and David
Holmes. Java Concurrency in Practice. Addison-Wesley Professional, 2005.
18. Aleksandar Prokopec, Phil Bagwell, Tiark Rompf, and Martin Odersky. A generic
parallel collection framework. In Emmanuel Jeannot, Raymond Namyst, and Jean
Roman, editors, Euro-Par 2011 Parallel Processing - 17th International Confer-
ence, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceed-
ings, Part II, volume 6853 of Lecture Notes in Computer Science, pages 136–147.
Springer, 2011.
19. Aleksandar Prokopec, Heather Miller, Tobias Schlatter, Philipp Haller, and Martin
Odersky. Flowpools: A lock-free deterministic concurrent dataflow abstraction. In
Hironori Kasahara and Keiji Kimura, editors, Languages and Compilers for Parallel
Computing, 25th International Workshop, LCPC 2012, Tokyo, Japan, September
11-13, 2012, Revised Selected Papers, volume 7760 of Lecture Notes in Computer
Science, pages 158–173. Springer, 2012.
20. Don Syme, Tomas Petricek, and Dmitry Lomov. The F# asynchronous program-
ming model. In Ricardo Rocha and John Launchbury, editors, Practical Aspects of
Declarative Languages - 13th International Symposium, PADL 2011, Austin, TX,
USA, January 24-25, 2011. Proceedings, volume 6539 of Lecture Notes in Computer
Science, pages 175–189. Springer, 2011.
21. Andrew K. Wright and Matthias Felleisen. A syntactic approach to type soundness.
Inf. Comput., 115(1):38–94, 1994.
... The notion of ParT [28] represents an array of futures, representing results from different invocations, effectively representing data among parallel processes, even if the implementation of ParT is not distributed. Synchronizing elements of a ParT effectively gathers data from parallel processes. ...
... For usual active objects, a distributed result may be produced through multiple active objects, and represented by a collection of future objects. More optimized solutions based on this principle were developed such as ParT [28]. We could adopt a solution based on having an array of futures to provide a distributed result, but this would require multiple active objects and we are more interested in working with a single active object to produce a result in parallel. ...
Thesis
Full-text available
This thesis presents a hybrid programming model between two parallel programming models : active objects and BSP (Bulk Synchronous Parallel). Active objects are specialized in task parallelism ; they enable the execution of functionally different codes in parallel and the exchange of their results thanks to futures, which represent these results before they are available. The BSP model enables a quite different parallelism from the one provided by active objects : data-parallelism. This form of parallelism consists of cutting a task into several pieces in order to process them faster in parallel. These two specialized models enable high-level programming and provide interesting properties such as ease of programming and determinism under certain conditions. The point of combining these two models is therefore to allow the writing of programs combining task-parallelism and data-parallelism, while benefiting from the properties of the two models. This thesis studies this new BSP active object model under a theoretical aspect (with operational semantics) and a practical aspect (with a C++/MPI implementation). We also introduce a new concept of distributed future. Our distributed futures consist in unifying the concepts of futures and distributed vectors in order to represent distributed data. This allows a better integration between active objects and BSP. With our distributed futures, our BSP active objects can communicate efficiently with each other in parallel. The efficiency of these distributed futures is shown through benchmark scenarios executed on our implementation. They allow us to confirm a performance improvement of our distributed futures against classical futures.
... The first one tries to improve the performance of all mechanisms used to execute Actors efficiently, mainly the Actor scheduling strategies [6,20,34,35]. The second approach, instead, follows the direction of extending the AM with new features and constructs [10,19,22,25,27,33]. Our work falls in the second category. ...
... Hains et al. [21], proposed a new programming model that uses Active Objects to coordinate BSP (Bulk Synchronous Parallel) computations. Fernandez-Reyes et al. [19] proposed an extension of the AO model with the ParT abstraction, capable of running efficient dataparallel computations in a non-blocking fashion with the ability to execute multiple dependent ParT in parallel and to stop the execution on those values that are discovered to be irrelevant for the final result. ...
Article
Full-text available
The Actor-based programming model is largely used in the context of distributed systems for its message-passing semantics and neat separation between the concurrency model and the underlying hardware platform. However, in the context of a single multi-core node where the performance metric is the primary optimization objective, the “pure” Actor Model is generally not used because Actors cannot exploit the physical shared-memory, thus reducing the optimization options. In this work, we propose to enrich the Actor Model with some well-known Parallel Patterns to face the performance issues of using the “pure” Actor Model on a single multi-core platform. In the experimental study, conducted on two different multi-core systems by using the C++ Actor Framework, we considered a subset of the Parsec benchmarks and two Savina benchmarks. The analysis of results demonstrates that the Actor Model enriched with suitable Parallel Patterns implementations provides a robust abstraction layer capable of delivering performance results comparable with those of thread-based libraries (i.e. Pthreads and FastFlow) while offering a safer and versatile programming environment.
... Complete configurations can be typed by adding extra conditions to ensure that all futures in Γ have a future configuration, there is a one-to-one correspondence between tasks/chains and unfulfilled futures, and dependencies between tasks are acyclic. These definitions have been omitted and are similar to those found in our earlier work [9]. ...
... Formal Properties The proof of soundness of the type system follows standard techniques [8]. The proof of progress requires that there is no deadlock, which follows as there is no cyclic dependency between tasks [9]. ...
... Closures that do not capture local state in a way that could lead to data races can be run independently of the actor that created them. Such closures can be used as a source of parallelism For example, Encore's parallel combinators [13] uses such closures to express asynchronous, speculative parallel pipelines. Ideally, in order to reason about parallelism, the programmer needs to know whether a closure can be run independently. ...
Conference Paper
Expressive actor models combine aspects of functional programming into the pure actor model enriched with futures. Such functional features include first-class closures which can be passed between actors and chained on futures. Combined with mutable objects, this opens the door to race conditions. In some situations, closures may not be evaluated by the actor that created them yet may access fields or objects owned by that actor. In other situations, closures may be safely fired off to run as a separate task. This paper discusses the problem of who can safely evaluate a closure to avoid race conditions, and presents the current solution to the problem adopted by the Encore language. The solution integrates with Encore's capability type system, which influences whether a closure is attached and must be evaluated by the creating actor, or whether it can be detached and evaluated independently of its creator. Encore's current solution to this problem is not final or optimal. We conclude by discussing a number of open problems related to dealing with closures in the actor model.
... ParT [8] extends Encore and Clojure to combine actors and futures. It decomposes actors into tasks to which asynchronous messages are sent whose result is a future, thus not directly supporting actors. ...
Conference Paper
Full-text available
Developers often combine different concurrency models in a single program, in each part of the program using the model that fits best. Many programming languages, such as Clojure, Scala, and Haskell, cater to this need by supporting different concurrency models. However, they are often combined in an ad hoc way and the semantics of the combination is not always well defined. This paper studies the combination of three concurrency models: futures, actors, and transactions. We show that a naive combination of these models invalidates the guarantees they normally provide, thereby breaking the assumptions of developers. Hence, we present Chocola: a unified framework of futures, actors, and transactions that maintains the guarantees of all models wherever possible, even when they are combined. We present the semantics of this model and its implementation in Clojure, and have evaluated its performance and expressivity using three benchmark applications.
... It is worth mentioning that futures are also used to implement some more complex synchronisation patterns. For example, Encore can use futures to coordinate parallel computations [Fernandez-Reyes et al., 2016] featuring operators to gather futures or perform computation pipelining. In ASP and ProActive, group of futures can be created to represent results of group communications enabling SPMD computation with active objects [Baduel et al., 2005]. ...
Thesis
The active object concept is a powerful computational model for defining distributed and concurrent systems. This model has recently gained prominence, largely thanks to its simplicity and its abstraction level. In this work we study an active object model with no explicit future type and wait-by-necessity synchronisations, a lightweight technique that synchronises invocations when the corresponding values are strictly needed. Although high concurrency combined with a high level of transparency leads to good performances, they also make the system more prone to problems such as deadlocks. This is the reason that led us to study deadlock analysis in this active objects model.The development of our deadlock analysis is divided in two main works. In the first work we focus on the implicit synchronisation on the availability of some value. This way we are able to analyse the data-flow synchronisation inherent to languages that feature wait-by-necessity. In the second work we present a static analysis technique based on effects and behavioural types for deriving synchronisation patterns of stateful active objects and verifying the absence of deadlocks in this context. Our effect system traces the access to object fields, thus allowing us to compute behavioural types that express synchronisation patterns in a precise way. As a consequence we can automatically verify the absence of deadlocks in active object based programs with wait-by-necessity synchronisations and stateful active objects.
Article
Distributed systems are challenging to design properly and prove correctly due to their heterogeneous and distributed nature. These challenges depend on the programming paradigms used and their semantics. The actor paradigm has the advantage of offering a modular semantics, which is useful for compositional design and analysis. Shared variable concurrency and race conditions are avoided by means of asynchronous message passing. The object-oriented paradigm is popular due to its facilities for program structuring and reuse of code. These paradigms have been combined by means of concurrent objects where remote method calls are transmitted by message passing and where low-level synchronization primitives are avoided. Such kinds of objects may exhibit active behavior and are often called active objects. In this setting the concept of futures is central and is used by a number of languages. Futures offer a flexible way of communicating and sharing computation results. However, futures come with a cost, for instance with respect to the underlying implementation support, including garbage collection. In particular this raises a problem for IoT systems. The purpose of this paper is to reconsider and discuss the future mechanism and compare this mechanism to other alternatives, evaluating factors such as expressiveness, efficiency, as well as syntactic and semantic complexity including ease of reasoning. We limit the discussion to the setting of imperative, active objects and explore the various mechanisms and their weaknesses and advantages. A surprising result (at least to the authors) is that the need of futures in this setting seems to be overrated.
Chapter
Among the programming models for parallel and distributed computing, one can identify two important families. The programming models adapted to data-parallelism, where a set of coordinated processes perform a computation by splitting the input data; and coordination languages able to express complex coordination patterns and rich interactions between processing entities. This article takes two successful programming models belonging to the two categories and puts them together into an effective programming model. More precisely, we investigate the use of active objects to coordinate BSP processes. We choose two paradigms that both enforce the absence of data-races, one of the major sources of error in parallel programming. This article explains why we believe such a model is interesting and provides a formal semantics integrating the notions of the two programming paradigms in a coherent and effective manner.
Book
Full-text available
This book constitutes the proceedings of the 18th International Conference on Coordination Models and Languages, COORDINATION 2016, held in Heraklion, Crete, Greece, in June 2016, as part of the 11th International Federated Conference on Distributed Computing Techniques, DisCoTec 2016. The 16 full papers included in this volume were carefully reviewed and selected from 44 submissions. The papers cover a wide range of topics and techniques related to system coordination, including: programming and communication abstractions; communication protocols and behavioural types; actors and concurrent objects; tuple spaces; games, interfaces and contracts; information flow policies and dissemination techniques; and probabilistic models and formal verification.
Article
Full-text available
As the number of cores increases in modern multiprocessors, it is becoming increasingly difficult to write general purpose applications that efficiently utilize this computing power. Most applications manipulate structured data. Modern languages and platforms provide collection frameworks with basic data structures like lists, hashtables and trees. These data structures have a range of predefined operations which include mapping, filtering or finding elements. Such bulk operations traverse the collection and process the elements sequentially. Their implementation relies on iterators, which are not applicable to parallel operations due to their sequential nature. We present an approach to parallelizing collection operations in a generic way, used to factor out common parallel operations in collection libraries. Our framework is easy to use and straightforward to extend to new collections. We show how to implement concrete parallel collections such as parallel arrays and parallel hash maps, proposing an efficient solution to parallel hash map construction. Finally, we give benchmarks showing the performance of parallel collection operations.
Conference Paper
Full-text available
Implementing correct and deterministic parallel programs is challenging. Even though concurrency constructs exist in popular pro-gramming languages to facilitate the task of deterministic parallel pro-gramming, they are often too low level, or do not compose well due to underlying blocking mechanisms. In this paper, we present the design and implementation of a fundamental data structure for composable de-terministic parallel dataflow computation through the use of functional programming abstractions. Additionally, we provide a correctness proof, showing that the implementation is linearizable, lock-free, and determin-istic. Finally, we show experimental results which compare our FlowPool against corresponding operations on other concurrent data structures, and show that in addition to offering new capabilities, FlowPools reduce insertion time by 49 − 54% on a 4-core i7 machine with respect to com-parable concurrent queue data structures in the Java standard library.
Conference Paper
Full-text available
We present a concurrent scripting language embedded in Haskell, emulating the functionality of the Orc orchestration language by providing many-valued (real) non-determinism in the context of concurrent effects. We provide many examples of its use, as well as a brief description of how we use the embedded Orc DSL in practice. We describe the abstraction layers of the implementation, and use the fact that we have a layered approach to demonstrate algebraic properties satisfied by the combinators.
Conference Paper
Full-text available
We describe the asynchronous programming model in F#, and its applications to reactive, parallel and concurrent programming. The key feature combines a core language with a non-blocking modality to author lightweight asynchronous tasks, where the modality has control flow constructs that are syntactically a superset of the core language and are given an asynchronous semantic interpretation. This allows smooth transitions between synchronous and asynchronous code and eliminates callback-style treatments of inversion of control, without disturbing the foundation of CPU-intensive programming that allows F# to interoperate smoothly and compile efficiently. An adapted version of this approach has recently been announced for a future version of C#.
Code
This artifact includes a Java-based library implementation of the Eureka programming model (EuPM) that simplifies the expression of speculative parallel tasks. Eureka-style computations are especially well-suited for parallel search and optimization applications. The artifact includes implementations of the eureka patterns that are supported by our Eureka API. These patterns include search, optimization, convergence, N-version programming, and soft real-time deadlines. These different patterns of computations can also be safely combined or nested in the EuPM, along with regular task-parallel constructs, thereby enabling high degrees of composability and reusability. We also include source code of the different benchmarks presented in the paper. The interested reader can use the artifact to experiment with various eureka-style applications and custom Eureka variants in the EuPM.
Chapter
The rise of multicore computers has hastened the advent of multifarious abstractions to facilitate the construction of parallel programs. This paper presents another: the vat. A vat is like a variable, but it has various actions attached to it that can block, transform and react to changes to the vat. Vats can be combined together in various ways, linking the behaviours of the vats together, resulting in various synchronisation mechanisms. Vats are powerful enough to encode (part of) many existing mechanisms including promises, condition variables, LVars and reactive programming.
Article
The age of multi-core computers is upon us, yet current programming languages, typically designed for single-core computers and adapted post hoc for multi-cores, remain tied to the constraints of a sequential mindset and are thus in many ways inadequate. New programming language designs are required that break away from this oldfashioned mindset. To address this need, we have been developing a new programming language called Encore, in the context of the European Project UpScale. The paper presents a motivation for the Encore language, examples of its main constructs, several larger programs, a formalisation of its core, and a discussion of some future directions our work will take. The work is ongoing and we started more or less from scratch. That means that a lot of work has to be done, but also that we need not be tied to decisions made for sequential language designs. Any design decision can be made in favour of good performance and scalability. For this reason, Encore offers an interesting platform for future exploration into object-oriented parallel programming.
Conference Paper
MapReduce and similar systems significantly ease the task of writing data-parallel code. However, many real-world computations require a pipeline of MapReduces, and programming and managing such pipelines can be difficult. We present FlumeJava, a Java library that makes it easy to develop, test, and run efficient data-parallel pipelines. At the core of the FlumeJava library are a couple of classes that represent immutable parallel collections, each supporting a modest number of operations for processing them in parallel. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies. To enable parallel operations to run efficiently, FlumeJava defers their evaluation, instead internally constructing an execution plan dataflow graph. When the final results of the parallel operations are eventually needed, FlumeJava first optimizes the execution plan, and then executes the optimized operations on appropriate underlying primitives (e.g., MapReduces). The combination of high-level abstractions for parallel data and computation, deferred evaluation and optimization, and efficient parallel primitives yields an easy-to-use system that approaches the efficiency of hand-optimized pipelines. FlumeJava is in active use by hundreds of pipeline developers within Google.
Article
We present a new approach to proving type soundness for Hindley/Milner-style polymorphic type systems. The keys to our approach are (1) an adaptation of subject reduction theorems from combinatory logic to programming languages, and (2) the use of rewriting techniques for the specification of the language semantics. The approach easily extends from polymorphic functional languages to imperative languages that provide references, exceptions, continuations, and similar features. We illustrate the technique with a type soundness theorem for the core of Standard ML, which includes the first type soundness proof for polymorphic exceptions and continuations.
Conference Paper
Customers and stakeholders have substantial investments in, and are comfortable with the performance, security and stability of, industry-standard platforms like the JVM and CLR. While Java and C# developers on those platforms may envy the succinctness, flexibility and productivity of dynamic languages, they have concerns about running on customer-approved infrastructure, access to their existing code base and libraries, and performance. In addition, they face ongoing problems dealing with concurrency using native threads and locking. Clojure is an effort in pragmatic dynamic language design in this context. It endeavors to be a general-purpose language suitable in those areas where Java is suitable. It reflects the reality that, for the concurrent programming future, pervasive, unmoderated mutation simply has to go. Clojure meets its goals by: embracing an industry-standard, open platform - the JVM; modernizing a venerable language - Lisp; fostering functional programming with immutable persistent data structures; and providing built-in concurrency support via software transactional memory and asynchronous agents. The result is robust, practical, and fast. This talk will focus on the motivations, mechanisms and experiences of the implementation of Clojure.