Content uploaded by Kiko Fernandez
Author content
All content in this area was uploaded by Kiko Fernandez on Jul 28, 2018
Content may be subject to copyright.
EasyChair Preprint
№43
ParT: An Asynchronous Parallel Abstraction for
Speculative Pipeline Computations
Kiko Fernandez-Reyes, Dave Clarke and Daniel S. McCain
EasyChair preprints are intended for rapid
dissemination of research results and are
integrated with the rest of EasyChair.
April 5, 2018
ParT: An Asynchronous Parallel Abstraction
for Speculative Pipeline Computations?
Kiko Fernandez-Reyes, Dave Clarke, and Daniel S. McCain
Department of Information Technology
Uppsala University, Uppsala, Sweden
Abstract. The ubiquity of multicore computers has forced program-
ming language designers to rethink how languages express parallelism and
concurrency. This has resulted in new language constructs and new com-
binations or revisions of existing constructs. In this line, we extended the
programming languages Encore (actor-based), and Clojure (functional)
with an asynchronous parallel abstraction called ParT, a data structure
that can dually be seen as a collection of asynchronous values (integrat-
ing with futures) or a handle to a parallel computation, plus a collection
of combinators for manipulating the data structure. The combinators can
express parallel pipelines and speculative parallelism. This paper presents
a typed calculus capturing the essence of ParT, abstracting away from
details of the Encore and Clojure programming languages. The calculus
includes tasks, futures, and combinators similar to those of Orc but im-
plemented in a non-blocking fashion. Furthermore, the calculus strongly
mimics how ParT is implemented, and it can serve as the basis for adap-
tation of ParT into different languages and for further extensions.
1 Introduction
The ubiquity of multicore computers has forced programming language designers
to rethink how languages express parallelism and concurrency. This has resulted
in new language constructs that, for instance, increase the degree of asynchrony
while exploiting parallelism. A promising direction is programming languages
with constructs for tasks and actors, such as Clojure and Scala [8, 16], due to
the lightweight overhead of spawning parallel computations. These languages
offer coarse-grained parallelism at the task and actor level, where futures act
as synchronisation points. However, these languages are lacking in high-level
coordination constructs over these asynchronous computations. For instance, it is
not easy to express dependence on the first result returned via a bunch of futures
and to safely terminate the computations associated with the other futures.
The task of terminating speculative parallelism is quite delicate, as the futures
may have attached parallel computations that depend on other futures, creating
complex dependency patterns that need to be tracked down and terminated.
?Partly funded by the EU project FP7-612985 UpScale: From Inherent Concurrency
to Massive Parallelism through Type-based Optimisations.
To address this need, this paper presents the design and implementation of
ParT, a non-blocking abstraction that asynchronously exploits futures and en-
ables the developer to build complex, data parallel coordination workflows using
high-level constructs. These high-level constructs are derived from the combi-
nators of the orchestration language Orc [11,12]. ParT is formally expressed in
terms of a calculus that, rather than being at a high level of abstraction, strongly
mimics how this asynchronous abstraction is implemented and is general enough
to be applied to programming languages with notions of futures.
The contributions of the paper are as follows: the design of an asynchronous
parallel data abstraction to coordinate complex workflows, including pipeline
and speculative parallelism, and a typed, non-blocking calculus modelling this
abstraction, which integrates futures, tasks and Orc-like combinators, supports
the separation of the realisation of parallelism (via tasks) from its specification,
and offers a novel approach to terminating speculative parallelism.
2 Overview
To set the scene for this paper, we begin with a brief overview to asynchronous
computations with futures and provide an informal description of the ParT ab-
straction and its combinators. A SAT solver example is used as an illustration.
In languages with notions of tasks and active objects [2, 8, 16], asynchronous
computations are created by spawning tasks or calling methods on active objects.
These computations can exploit parallelism by decoupling the execution of the
caller and the callee [7]. The result of a spawn or method call is immediately
a future, a container that will eventually hold the result of the asynchronous
computation. A future that has received a value is said to be fulfilled. Operations
on futures may be blocking, such as getting the result from a future, or may be
asynchronous, such as attaching a callback to a future. This second operation,
called future chaining and represented by f callback, immediately returns
a new future, which will contain the result of applying the callback function
callback to the contents of the original future after it has been fulfilled. A future
can also be thought of as a handle to an asynchronous computation that can be
extended via future chaining or even terminated. This is an useful perspective
that we will further develop in this work. In languages with notions of actors,
such as Clojure and Encore [2], asynchrony is the rule and blocking on futures
suffers a large performance penalty. But creating complex coordination patterns
based on a collection of asynchronous computations without blocking threads
(to maintain the throughput of the system) is no easy task.
To address this need, we have designed an abstraction, called ParT, which
can be thought of as a handle to an ongoing parallel computation, allowing
the parallel computation to be manipulated, extended, and terminated. A ParT
is a functional data structure, represented by type Par t, that can be empty
{} :: Par t, contain a single expression {−} :: t→Par t, or futures attached to
computations producing values, using (−)◦:: Fut t→Par t, or computations
producing ParTs, embedded using (−)†:: Fut (Par t)→Par t. Multiple ParTs
1def fut2Par(f: Fut(Maybe a)): Par a
2(f \(m: Maybe a) ->
3match m with Nothing => {}; Just val => {val})†
4
5def evaluateFormula(form: Formula, a: Assignment): (Maybe bool, Assignment)
6...
7
8def sat(st: Strategy, fml: Formula, a: Assignment): Par Assignment
9let variable = st.getVariable(fml, a)
10 a1 = a.extendAssignment(variable, true)
11 a2 = a.extendAssignment(variable, false)
12 in
13 ({evaluateFormula(fml, a1)} || {evaluateFormula(fml, a2)}) >>=
14 \(result: (Maybe bool, Assignment)) ->
15 match result with
16 (Nothing, ar) => sat(st, fml, ar);
17 (Just true, ar) => {ar};
18 (Just false, ar) => {};
19
20 def process(sts: [Strategy], fml: Formula): Par Assignment
21 fut2Par << (each(sts) >>= \(s: Strategy) ->
22 (async sat(s, fml, new Assignment()))†)
Fig. 1: A SAT solver in Encore.
can be combined using the par constructor, k:: Par t→Par t→Par t. This
constructor does not necessarily create new parallel threads of control, as this
would likely have a negative impact on performance, but rather specifies that
parallelism is available. The scheduler in the ParT implementation can choose
to spawn new tasks as it sees fit — this is modelled in the calculus as a single
rule that nondeterministically spawns a task from a par (rule Red-Schedule).
The combinators can express complex coordination patterns and operate
on them in a non-blocking manner, and safely terminate speculative parallelism
even in the presence of complex workflows. These combinators will be illustrated
using an example, then explained in more detail.
Illustrative example. Consider a portfolio-based SAT solver (Fig. 1), which cre-
ates numerous strategies, of which each finds an assignment of variables to
Boolean values for a given proposition, runs them in parallel, and accepts the
first solution found. Each strategy tries to find a solution by selecting a variable
and creating two instances of the formula, one where the variable is assigned
true, the other where it is assigned false (called splitting) — strategies differ in
the order they select variables for splitting. These new instances can potentially
be solved in parallel.
The example starts in function process (line 20) which receives an array of
strategies and the formula to solve. Strategies do not interact with each other
and can be lifted to a ParT, creating a parallel pipeline (line 21) using the each
and bind (=) combinators. As soon as one strategy finds an assignment, the
remaining computations are terminated via the prune () combinator.
For each strategy, a call to the sat function (line 8) is made in parallel using
a call to async, which in this case returns a value of type Fut (Par Assignment).
Function sat takes three arguments: a strategy, a formula and an assignment
object containing the current mapping from variables to values. This function
uses the strategy object to determine which variable to split next, extends the
assignment with new valuations (lines 9–11), recursively solves the formula (by
again calling sat), and returns an assignment object if successful. The evaluation
of the formula, evaluateFormula returns, firstly, an optional Boolean to indicate
whether evaluation has completed, and if it has completed, whether the formula
is satisfiable, and secondly, the current (partial) variable assignment. The two
calls to evaluateFormula are grouped into a new ParT collection (using || )
and, with the use of the = combinator, a new asynchronous pipeline is created
to either further evaluate the formula by calling sat, to return the assignment
in the case that a formula is satisfiable as a singleton ParT, or {} when the
assignment does not satisfy the formula (lines 14–18).
Finally, returning back to process, the prune combinator () (line 21) is
used to select the first result returned by the recursive calls to sat, if there is
one. This result is converted from an option type to an empty or singleton ParT
collection (again asynchronously), which can then be used in a larger parallel
operation, if so desired. The prune combinator will begin poisoning and safely
terminating the no longer needed parallel computations, which in this case will
be an ongoing parallel pipeline of calls to sat and evaluateFormula.
ParT Combinators. The combinators are now described in detail. The combi-
nators manipulate ParT collections and were derived from Orc [11,12], although
in our setting, they are typed and redefined to be completely asynchronous,
never blocking the thread. Primitive combinators express coordination patterns
such as pipeline and speculative parallelism, and more complex patterns can be
expressed based on these primitives.
Pipeline parallelism is expressed in ParT with the sequence and bind combi-
nators. The sequence combinator, :: Par t→(t→t0)→Par t0, takes a ParT
collection and applies the function to each element in the collection, potentially
in parallel, returning a new ParT collection. The bind combinator (derived from
other combinators) = :: Par t→(t→Par t0)→Par t0is similar to the
sequence combinator, except that the function returns a ParT collection and the
resulting nested ParT collection is flattened. (Par is a monad!1) In the presence
of futures inside a ParT collection, these combinators use the future chaining
operation to create independent and asynchronous pipelines of work.
Speculative parallelism is realised by the peek combinator, peek :: Par t→
Fut (Maybe t), which sets up a speculative computation, asynchronously waits
for a single result to be produced, and then safely terminates the speculative
work. To terminate speculative work the ParT abstraction poison these specu-
1The monad operations on Par are essentially the same as for lists but parallelised.
lative computations, which may have long parallel pipelines to which the poison
spreads recursively, producing a pandemic infection among futures, tasks and
pipelines of computations. Afterwards, poisoned computations that are no longer
needed can safely be terminated. Metaphorically, this is analogous to a tracing
garbage collector.
The value produced by peek is a future to an option type. The option type
is used to capture whether the parallel collection was empty or not. The empty
collection {} results in Nothing, and a non-empty collection results in a Just v,
where vis the first value produced. The conversion to option type is required
because ParTs cannot be tested for emptiness without blocking. The peek com-
binator is an internal combinator, i.e., it is not available to the developer and is
used by the prune combinator (explained below).
Built on top of peek is the prune combinator, :: (Fut (Maybe t)→
Par t0)→Par t→Par t0, which applies a function in parallel to the future
produced by peek, and returns a parallel computation.
Powerful combinators can be derived from the ones mentioned above. An
example of a derived combinator, which is a primitive in Orc, is the otherwise
combinator, >< :: Par t→Par t→Par t(derivation is shown in Section 3.1).
Expression e1>< e2results in e1unless it is an empty ParT, in which case it
results in e2.
Other ParT combinators are available. For instance, each :: [t]→Par tand
extract :: Par t→[t] convert between sequential (arrays) and ParTs. The
latter potentially requires a lot of synchronisation, as all the values in the collec-
tion need to be realised. Both have been omitted from the formalism, because
neither presents any real technical challenge — the key properties of the formal-
ism, namely, deadlock-freedom, type preservation and task safety (Section 3.5),
still hold with these extensions in place.
3 A Typed ParT Calculus
This section presents the operational semantics and type system of a task-based
language containing the ParT abstraction. The formal model is roughly based
on the Encore formal semantics [2,5], with many irrelevant details omitted.
3.1 Syntax
The core language (Fig. 2) contains expressions eand values v. Values include
constants c, variables, futures f, lambda abstractions, and ParT collections of
values. Expressions include values v, function application (e e), task creation,
future chaining, and parallel combinators. Tasks are created via the async ex-
pression, which returns a future. The parallel combinators are those covered in
Section 2 ( || ,,peek and ), plus some derived combinators, together with
the low-level combinator join that flattens nested ParT collections. Recall that
peek is used under-the-hood in the implementation of . Status πcontrols how
peek behaves: when πis and the result in peek is an empty ParT collection,
e::= v|e e |async e|e e| {e} | e|| e
|ee|ee|e◦|e†|join e|peekπe
v::= c|f|x|λx.e | {} | {v} | f◦|f†|v|| v
π::= |
Fig. 2: Syntax of the language.
the value is discarded and not written to the corresponding future. This status
helps to ensure that precisely one speculative computation writes into the fu-
ture and that a speculative computation fails to produce a value only when all
relevant tasks fail to produce a value.
ParT collections are monoids, meaning that the composition operation e|| e
is associative and has {} as its unit. As such, ParT collections are sequences,
though no operations such as getting the first element are available to access
them sequentially. As an alternative, adding in commutativity of || would give
multiset semantics to the ParT collections — the operational semantics is oth-
erwise unchanged. Two for one!
A number of the constructs are defined by translation into other constructs.
let x=ein e0b= (λx.e0)e
e1>< e2
b=let x=e1in
(λy.(y (λz.match zwith Nothing →e2;→x))†)x
e1=e2
b=join (e1e2)
maybe2par b=λx.match xwith Nothing → {};Just y→ {y}
The encoding of let is standard. In e1>< e2, pruning is used to test
the emptyness of e1. If it is not empty, the result of e1is returned, otherwise
the result is e2. The definition of = is a standard definition of monadic bind
in terms of map () and join. We assume for convenience a Maybe type and
pattern matching on it.
3.2 Configurations
Running programs are represented by configurations (Fig. 3). Configurations can
refer to the global system or a partial view of the system. A global configura-
tion {config}captures the complete global state, e.g., {(futf) (taskfe)}shows
a global system containing a single task running expression e. Local configura-
tions, written as config, show a partial view of the state of the program. These are
multisets of tasks, futures, poison and future chains. The empty configuration is
represented by . Future configurations, (futf) and (futfv), represent unful-
filled and fulfilled futures, respectively. Poison is the configuration (poison f)
that will eventually terminate tasks and chains writing to future fand their
dependencies. A running task (taskα
fe) has a body eand will write its result to
gconfig ::= {config}
config ::= |(futf)|(futfv)|(poison f)|(taskα
fe)|(chainα
fg e)|config config
α::= |t
Fig. 3: Runtime configurations.
future f. The chain configuration (chainα
fg e) depends on future gthat, when
fulfilled, will then run expression eon the value stored in g, and write its value
into future f. Concatenation of configurations, config config0, is associative and
commutative with the empty configuration as its unit (Fig. 12).
Tasks and chains have a flag αthat indicates the poisoned state of the com-
putation. Whitespace ‘ ’ indicates that the computation has not been poisoned,
and tindicates that the computation has been poisoned and can be safely ter-
minated, if it is not needed (see Rule Red-Terminate of Fig. 10).
The initial configuration to evaluate expression eis {(taskfe) (futf)}, where
the value written into future fis the result of the expression.
3.3 Reduction Rules
The operational semantics is based on a small-step, reduction-context based rules
for evaluation within tasks, and parallel reduction rules for evaluation across
configurations. Evaluation is captured by expression-level evaluation context E
containing a hole •that marks where the next step of the reduction will occur
(Fig. 4). Plugging an expression einto an evaluation context E, denoted E[e],
represents both the subexpression to be evaluated next and the result of reducing
that subexpression in context, in the standard fashion [21].
E::= • | E e |v E |E e|v E| {E} | E|| e|v|| E|Ee|vE
|Ee|E◦|E†|join E|peekπE
Fig. 4: Expression-level evaluation contexts.
Reduction of configurations is denoted config →config0, which states that
config reduces in a single step to config0.
Core Expressions. The core reduction rules (Fig. 5) for functions, tasks and
futures are well-known or derived from earlier work [5]. Together, the rules Red-
Chain and Red-ChainV describe how future chaining works, initially attaching
a closure to a future (via the chain configuration), then evaluating the closure
in a new task after the future has been fulfilled.
(Red-β-Red)
(taskα
gE[(λx.e)v]) →(taskα
gE[e[v/x]])
(Red-Async)
fresh f
(taskα
gE[async e]) →(futf) (taskα
fe) (taskα
gE[f])
(Red-FutV)
(taskα
fv) (futf)→(futfv)
(Red-Chain)
fresh h
(taskα
gE[f v]) →(futh) (chainα
hf v) (taskα
gE[h])
(Red-ChainV)
(chainα
gf e) (futfv)→(taskα
g(e v)) (futfv)
Fig. 5: Core reduction rules.
Sequencing. The sequencing combinator creates pipeline parallelism. Its se-
mantics are defined inductively on the structure of ParT collections (Fig. 6). The
second argument must be a function (tested in function application, but guar-
anteed by the type system). In Red-SeqS, sequencing an empty ParT results
in another empty ParT. A ParT with a value applies the function immediately
(Red-SeqV). A lifted future is asynchronously accessed by chaining the func-
tion onto it (Red-SeqF). Rule Red-SeqP recursively applies vto the two
sub-collections. A future whose content is a ParT collection chains a recursive
call to vonto the future and lifts the result back into a ParT collection
(Red-SeqFP).
(Red-SeqS)
(taskα
gE[{} v]) →(taskα
gE[{}])
(Red-SeqV)
(taskα
gE[{v} v0]) →(taskα
gE[{v0v}])
(Red-SeqF)
(taskα
gE[f◦v]) →(taskα
gE[(f v)◦])
(Red-SeqFP)
(taskα
gE[f†v]) →(taskα
gE[(f (λx.x v))†])
(Red-SeqP)
(taskα
gE[(v1|| v2)v]) →(taskα
gE[(v1v)|| (v2v)])
Fig. 6: Reduction rules for the sequence combinator.
Join. The join combinator flattens nested ParT collections of type Par (Par t)
(Fig. 7). Empty collections flatten to empty collections (Red-JoinS). Rule Red-
JoinV extracts the singleton value from a collection. A lifted future that con-
tains a ParT (type Fut (Par t)) is simply lifted to a ParT collection (Red-
JoinF). In Red-JoinFP, a future containing a nested ParT collection (type
Fut (Par (Par t))), chains a call to join to flatten the inner structure. Rule
Red-JoinP applies the join combinator recursively to the values in the ParT
collection.
Prune and Peek. Pruning is the most complicated part of the calculus, though
most of the work is done using the peek combinator (Fig. 8). Firstly, rule Red-
(Red-JoinS)
(taskα
gE[join {}]) →(taskα
gE[{}])
(Red-JoinV)
(taskα
gE[join {v}]) →(taskα
gE[v])
(Red-JoinF)
(taskα
gE[join f◦]) →(taskα
gE[f†])
(Red-JoinFP)
(taskα
gE[join f†]) →(taskα
gE[(f (λx.join x))†])
(Red-JoinP)
(taskα
gE[join (v1|| v2)]) →(taskα
gE[(join v1)|| (join v2)])
Fig. 7: Reduction rules for the join combinator.
(Red-Prune)
fresh f
(taskα
gE[vv0]) →(futf) (taskα
f(peek v0)) (taskα
gE[v f])
(Red-PeekS)
(taskα
gE[peek{}]) →
(Red-PeekS)
(taskα
gE[peek {}]) (futg)→(futgNothing) (poison g)
(Red-PeekV)
(taskα
gE[peekπ({v} || v0) ]) (futg)→(futg(Just v)) (poison g)[
h∈deps(v0)
(poison h)
(Red-PeekF)
(taskα
gE[peekπ(f◦|| v)]) →(chainα
gf(λx.peekπ{x})) (taskα
g(peekv))
(Red-PeekFP)
fresh h
(taskα
gE[peekπ(f†|| v)]) →
(chainα
gf(λx.peekπ(x|| (h maybe2par)†)))
(futh) (taskα
h(peek v)) (chainα
gh(λx.peek(maybe2par x)))
Fig. 8: Reduction rules for pruning. For singleton collections are handled via
equality v=v|| {}.
Prune spawns a new task that will peek the collection v0, and passes this new
task’s future to the function v. The essence of the peek rules is to set up a bunch
of computations that compete to write into a single future, with the strict re-
quirement that Nothing is written only when all competing tasks cannot produce
a value—that is, when the ParT being peeked is empty. This is challenging due
to the lifted future ParTs (type Fut (Par t)) within a collection, because such a
future may be empty, but this fact cannot easily be seen in a non-blocking way.
Another challenge is to avoid introducing sequential dependencies between enti-
ties that can potentially run in parallel, to avoid, for instance, a non-terminating
computation blocking one that will produce a result.
A task that produces a ParT containing a value (rule Red-PeekV) writes the
value, wrapped in an option type, into the future and poisons all computations
writing into that future, recursively poisoning direct dependencies. The status
on peek prevents certain peek invocations from writing a final empty result, as
in rule Red-PeekS. Contrast with Red-PeekS, in which a task resulting in
an empty ParT writes Nothing into the future — in this case it is guaranteed
that no other peek exists writing to the future.
A lifted future fis guaranteed to produce a result, though it may not pro-
duce it in a timely fashion. This case is handled (rule Red-PeekF) by chaining
a function onto it that will ultimately write into future gwhen the value is pro-
duced, if it wins the race. Otherwise, the result of peeking into vis written into
g, unless the value produced is {} (which is controlled by ).
A lifted future to a ParT is not necessarily guaranteed to produce a result,
and neither is any ParT that runs in parallel with it. Thus, extra care needs
to be taken to ensure that Nothing is written if and only if both are actually
empty. This is handled in rule Red-PeekFP. Firstly, a function is chained onto
the lifted future to get access to the eventual ParT collection. This is combined
with future hthat is used to peek into vvia a new task.
In all cases, computations propagate the poison state αto new configurations.
Scheduling. Rule Red-Schedule (Fig. 9) models the non-deterministic schedul-
ing of parallelism within a task, converting some of the parallelism latent in a
ParT collection into a new task. Apart from this rule, expressions within tasks
are evaluated sequentially.
(Red-Schedule)
fresh f
(taskα
gE[e1|| e2]) →(taskα
gE[e1|| f†]) (futf) (taskα
fe2)
Fig. 9: Spawning of tasks inside a ParT.
Poisoning and Termination. The rules for poisoning and termination (Fig. 10)
are based on a poisoned carrier configuration defined as (P Cα
fe) ::= (taskα
fe)|
(chainα
fg e); these rules rely on the definition of when a future is needed (Defini-
tion 2), which in turn is defined in terms of the futures on which a task depends
to produce a value (Definition 1).
Definition 1. The dependencies of an expression e, deps(e), is the set of the
futures upon which the computation of edepends in order to produce a value:
deps(f) = {f}
deps(c) = deps({}) = deps(x) = ∅
deps({e}) = deps(λx.e) = deps(async e) = deps(e◦) = deps(e†) =
deps(peekπe) = deps(join e) = deps(e)
deps(e|| e0) = deps(e e0) = deps(ee0) = deps(e=e0) =
deps(e>< e0) = deps(e e0) = deps(ee0) = deps(e)∪deps(e0)
deps((taskα
fe)) = deps(e)
deps((chainα
fg e)) = {g} ∪ deps(e).
Definition 2. A future fis needed in configuration config, denoted config `
needed(f), whenever some other element of the configuration depends on it:
config `needed(f)iff (P Cα
ge)∈config ∧f∈deps((PC α
ge)) ∧(futf)∈config.
(Red-Poison)
(poison f) (P Cfe)→(poison f) (P Ct
fe)S
g∈deps((PCfe))
(poison g)
(Red-Terminate)
¬(config `needed(f))
{(P Ct
fe)config}→{config}
Fig. 10: Poisoning reduction rules.
Configurations go through a two step process before being terminated. In
the first step (rule Red-Poison) the poisoning of future fpoisons any task
or chain writing to f, marks it with t, and the poison is transmitted to the
direct dependencies of the expression ein the task or chain. In the second step
(Red-Terminate), a poisoned configuration is terminated when there is no
other configuration relying on its result — that is, a poisoned task or chain is
terminated if there is no expression around to keep it alive. This rule is global,
referring to the entire configuration. Termination can be implemented using
tracing garbage collection, though in the semantics a more global specification
of dependency is used.
An example (Fig. 11) illustrates how poisoning and termination work to pre-
vent a task that is still needed from being terminated. Initially, there is a bunch of
tasks (squares) and futures (circles) (Fig. 11A), where one of the tasks completes
and writes a value to future f. This causes all of the other tasks writing to fto
be poisoned, via Rule Red-PeekV (Fig. 11B). After application of rule Red-
Poison, the dependent tasks and futures are recursively poisoned (Fig. 11C).
Finally, the application of rule Red-Terminate terminates tasks that are not
needed (Fig. 11D). Task e1is not terminated, as future gis required by the task
computing e g.
Configurations. The concatenation operation on configurations is commutative
and associative and has the empty configuration as its unit (Fig. 12). We assume
that these equivalences, along with the monoid axioms for k, can be applied at
any time during reduction.
The reduction rules for configurations (Fig. 13) have the individual configu-
ration reduction rules at their heart, along with standard rules for parallel evalu-
ation of non-conflicting sub-configurations, as is standard in rewriting logic [14].
A
RED-PEEKV
{v}
e g
g
e1
j
je2
e3
g
f
writes
h
he5 || e6
{v}
C
e g
g
e1
g
f
terminates
j
je2
e3
RED-TERMINATE
h
he5 || e6
terminates
terminates
{v}
B
e g
g
e1
j
je2
e3
g
f
poison
poison
RED-POISON
h
he5 || e6
poison
poison
{v}
D
e g
g
e1
f
j
h
Fig. 11: Safely poisoning and terminating a configuration. The letter in the
top right corner indicates the order. Tasks are represented by squares, contain
a body and have an arrow to the future they write to. Futures (circles) have
dotted arrows to tasks that use them. Grey represents poisoned configurations.
Terminated configurations are removed.
config ≡config ≡config config config0≡config0config
config (config0config00)≡(config config0)config00 config ≡config0
{config}≡{config0}
Fig. 12: Configuration equivalence modulo associativity and commutativity.
3.4 Type System
The type system (Fig. 14) assigns the following types to terms:
τ::= K|Fut τ|Par τ|Maybe τ|τ→τ
where Krepresents the basic types, Fut τis the type of a future containing
a value of type τ,Par τis the type of a ParT collection of type τ,Maybe τ
represents an option type, and τ→τrepresents function types. We also let ρ
range over types.
The key judgement in the type system is Γ`ρe:τwhich asserts that, in
typing context Γ, the expression eis a well-formed term with type τ, where
ρis the expected return type of the task in which this expression appears —
ρis required to type peek. The typing context contains the types of both free
variables and futures.
Rule TS-Async gives the type for task creation and rule TS-Chain shows
how to operate on such values — future chaining has the type of map for the
config →config0
config config00 →config0config00
config0→config0
0config1→config0
1
config0config1→config0
0config0
1
config0config00 →config0
0config00 config1config00 →config0
1config00
config0config1config00 →config0
0config0
1config00
config →config0
{config}→{config0}
Fig. 13: Configuration reduction rules
(TS-Const)
cis a constant of type τ
Γ`ρc:τ
(TS-Fut)
f:Fut τ∈Γ
Γ`ρf:Fut τ
(TS-X)
x:τ∈Γ
Γ`ρx:τ
(TS-App)
Γ`ρe1:τ0→τ Γ `ρe2:τ0
Γ`ρe1e2:τ
(TS-Fun)
Γ, x :τ`ρe:τ0
Γ`ρλx.e :τ→τ0
(TS-Async)
Γ`ρe:τ
Γ`ρasync e:Fut τ
(TS-Chain)
Γ`ρe1:Fut τ0Γ`ρe2:τ0→τ
Γ`ρe1 e2:Fut τ
(TS-EmptyPar)
Γ`ρ{} :Par τ
(TS-SingletonPar)
Γ`ρe:τ
Γ`ρ{e}:Par τ
(TS-LiftF)
Γ`ρe:Fut τ
Γ`ρe◦:Par τ
(TS-LiftFP)
Γ`ρe:Fut (Par τ)
Γ`ρe†:Par τ
(TS-Par)
Γ`ρe1:Par τ Γ `ρe2:Par τ
Γ`ρe1|| e2:Par τ
(TS-Sequence)
Γ`ρe1:Par τ0Γ`ρe2:τ0→τ
Γ`ρe1e2:Par τ
(TS-Join)
Γ`ρe:Par (Par τ)
Γ`ρjoin e:Par τ
(TS-Otherwise)
Γ`ρe1:Par τ Γ `ρe2:Par τ
Γ`ρe1>< e2:Par τ
(TS-Peek)
Γ`Maybe ρe:Par ρ
Γ`Maybe ρpeekπe:τ
(TS-Prune)
Γ`ρe1:Fut (Maybe τ)→Par τ0Γ`ρe2:Par τ
Γ`ρe1e2:Par τ0
(TS-Bind)
Γ`ρe1:Par τ0Γ`ρe2:τ0→Par τ
Γ`ρe1=e2:Par τ
Fig. 14: Expression Typing.
Fut constructor. Rules TS-EmptyPar,TS-SingletonPar,TS-LiftF,TS-
LiftFP, and TS-Par give the typings for constructing ParT collections. Rule
TS-Sequence implies that sequencing has the type of map for the Par con-
structor. TS-Bind and TS-Join give = and join the types of the monadic
bind and join operators for the Par constructor, respectively. Rule TS-Prune
captures the communication between the two parameters via the future passed
as an argument to the first parameter — the future will contain the first value
of the second parameter if there is one, captured by the Maybe type. Rule TS-
Peek captures the conversion of the singleton or empty argument of peek from
Par ρto Maybe ρ, the expected result type of the surrounding task. Because
peek terminates the task and does not return locally, its return type can be any
type.
Well-formed configurations (Fig. 15) are expressed by the judgement Γ`
config ok, where Γis the assumptions about the future types in config. Rules
(T-Fut)
f∈dom(Γ)
Γ`(futf)ok
(T-FutV)
f:Fut τ∈Γ Γ `τv:τ
Γ`(futfv)ok
(T-Poison)
f:Fut τ∈Γ
Γ`(poison f)ok
(T-Task)
f:Fut τ∈Γ Γ `τe:τ
Γ`(taskα
fe)ok
(T-Chain)
f1:Fut τ1∈Γf2:Fut τ2∈Γ Γ `τ2e:τ1→τ2
Γ`(chainα
f2f1e)ok
(T-Config)
Γ`config1ok Γ`config2ok futset(config1)∩futset (config2) = ∅
Γ`config1config2ok
(T-GConfig)
Γ`config ok dom(Γ) = futset(config)TaskSafe(config)AcyclicDep(config)
Γ` {config}ok
Fig. 15: Configuration Typing.
T-Task and T-Chain propagate the eventual expected result type on the turn-
style `when typing the enclosed expression. Rule T-Config depends upon the
following definition, a function that collects all futures defined in a configuration:
Definition 3. Define futset(config)as:
futset((futf)) = futset((futfv)) = {f}
futset((config1config2)) = futset(config1)∪futset(config2)
futset( ) = ∅.
Rule T-GConfig defines the well-formedness of global configurations, judge-
ment Γ` {config}ok. This rule depends on a number of definitions that capture
properties on futures and tasks and on the dependency between futures. The in-
variance of these properties is ultimately used to proof type soundness and other
safety properties of the system.
Definition 4. Define the following functions for collecting the different kinds of
tasks and chains of a configuration:
regularf(config) = {(taskfe)∈config |e6=peekπe0}
∪ {(chainfg e)∈config |e6=λ . peekπe0}
peekerf(config) = {(taskf(peek e)) ∈config}
∪ {(chainfg(λ . peek e)) ∈config}
peeker
f(config) = {(taskf(peeke)) ∈config}
∪ {((chainfg(λ . peeke)) ∈config}
Tasks with no peek expression are called regular tasks, while peeker tasks
have the peek expression — there are both - and non--peeker tasks. These
functions can be used to partition the tasks and chains in a configuration into
these three kinds of tasks and chains. These definitions consider peek expressions
only at the top level of a task, although the syntax allows them to be anywhere.
Based on the reduction rules, one can prove that peek only appears at the top
level of a task or chain, so no task or chain is excluded by these definitions.
Definition 5. Define predicate TaskSafe(config)as follows:
TaskSafe(config)iff for all f∈futset(config)
|regularf(config)∪peekerf(config)| ≤ 1
∧(futf)∈config ∧(poison f)/∈config ⇒
|regularf(config)∪peekerf(config)|= 1
∧ |regularf(config)|= 1 ⇒peekerf(config)∪peeker
f(config) = ∅
∧(taskα
f(peek {})) ∈config ⇒(futf)∈config ∧peeker
f(config) = ∅
Predicate TaskSafe(config) (Definition 5) describes the structure of the con-
figuration config. It states that:
–there is at most one regular or non--peeker task per future;
–if a future has not yet been fulfilled and it is not poisoned, then there exists
exactly one regular task or non--peeker task that fulfils it;
–regular tasks and peeker tasks do not write to the same futures; and
–if a peeker task that is about to fulfil a future with Nothing, then the future
is unfulfilled and no -peeker task fulfilling the same future exists.
The following definition establishes dependencies between futures. Predicate
config `fCgholds for all future gwhose eventual value could influence the
result stored in future f.
Definition 6. Define the predicate config `fCg as the least transitive relation
satisfying the following rules:
(taskα
fe)∈config g ∈deps(e)
config `fCg
(chainα
fh e)∈config g ∈deps(e)∪ {h}
config `fCg
(futfv)∈config g ∈deps(v)
config `fCg
Definition 7. Predicate AcyclicDep(config)holds iff relation Cis acyclic,
where Cis defined for config in Definition 6.
Rule T-GConfig for well-formed global configurations requires that pre-
cisely the futures that appear in the typing environment Γappear in the config-
uration, that the configuration is well-formed, and that it satisfies the properties
TaskSafe and AcyclicDep. By including these properties as a part of the well-
formedness rule for global configurations, type preservation (Lemma 1) makes
these invariants. These invariants on the structure of tasks and the dependency
relation together ensure that well-typed configurations are deadlock-free, as we
explore next.
3.5 Formal Properties
The calculus is sound and deadlock-free. These results extend previous work [15]
to address the pruning combinator.
Lemma 1 (Type Preservation). If Γ` {config}ok and {config}→{config0},
then there exists a Γ0such that Γ0⊇Γand Γ0` {config0}ok.
Proof. By induction on derivation {config}→{config0}. In particular, the in-
variance of AcyclicDep is shown by considering the changes to the dependen-
cies caused by each reduction rule. The only place where new dependencies are
introduced is when new futures are created. Adding a future to the dependency
relation cannot introduce cycles. ut
The following lemma states that the notion of needed, which determines
whether or not to garbage collect a poisoned task or chain, is anti-monotonic,
meaning that after a future is no longer needed according to the definitions, it
does not subsequently become needed.
Lemma 2 (Safe Task Kill). If Γ` {config}ok and {config}→{config0}, then
¬(config `needed(f)) implies ¬(config0`needed(f)).
Proof. A future is initially created in a configuration where it is needed. If ever
a future disappears from deps(e), if can never reappear. ut
This lemma rules out the situation where a task is poisoned and garbage
collected, but is subsequently needed. For instance, the application of rule Red-
Terminate in Fig. 11C kills tasks e2,e3,e5and e6(shown in Fig 11D). If the
future into which these tasks were going to write is needed afterwards, there
would be a deadlock as a new task could chain on that future but never be
fulfilled.
Definition 8 (Terminal Configuration). A global configuration {config}is
terminal iff every element of config has one of the following shapes: (futf),
(futfv)or (poison f).
Lemma 3 (Deadlock-Freedom/Progress). If Γ` {config}ok, then config is
a terminal configuration, or there exists a config0such that {config}→{config0}.
Proof. By induction on a derivation of {config} → {config0}, relying on the
invariance of AcyclicDep and Lemma 2. ut
Deadlock-freedom guarantees that some reduction rule can be applied to a
well-typed, non terminal, global configuration — this is essentially the progress
property required to prove type safety. It implies further that there are no local
deadlocks, such as a deadlocked configuration like (chainfg e) (chaingf e0).
Such a configuration fails to satisfy the AcyclicDep invariant, thus cannot
exist. If mutable state is added to the calculus, deadlock-freedom is lost.
Implementations. There are two prototypes of the ParT abstraction. In the
first prototype,2ParT has been written as an extension to the Encore compiler
(written in Haskell) and runtime (written in C) but, it can be implemented in
well-established languages with notions of tasks and futures. This prototype in-
tegrates futures produced by tasks and active objects with the ParT abstraction.
The other prototype has been written in Clojure,3which is not statically typed.
Both prototypes follow the semantics to guide the implementation. In practice,
this means that the semantic rules are written in such a way that they can be
easily mimicked in a library or in a language runtime.
4 Related Work
Our combinators have been adapted from those of the Orc [11, 12] programming
language. In ParT, these combinators are completely asynchronous and are inte-
grated with futures. ParTs are first class citizens and can be nested Par (Par t),
neither of which is possible in Orc, which sits on top of the expression being
coordinated and a flat collection of values.
Meseguer et al. [1] used rewriting logic semantics and Maude to provide a
distributed implementation of Orc. Their focus on the semantic model allows
them to model check Orc programs. In this paper, our semantics is more fine-
grained, and guides the implementation in a multicore setting.
ParT uses a monad to encapsulate asynchronous computations, which is not
a new idea [3,13,20]. For instance, F# expresses asynchronous workflows using a
continuation monad [20] but cannot create more parallelism within the monad,
making the model better suited for event-based programming. In contrast, our
approach can spawn parallel computations and include them within ParTs.
Other work implements Orc combinators in terms of a monad within the pure
functional language Haskell [3,13]. One of these approaches [3] relies on threads
and channels and implements the prune combinator using sequential compo-
sition, losing potential parallelism. The other approach [13] uses Haskell threads
and continuations to model parallel computations and re-designs the prune
combinator in terms of a cut combinator thats sparks off parallel computa-
tions, waits until there is a value available and terminates, in bulk, the remain-
ing computations. In contrast, the ParT abstraction relies on more lightweight
tasks instead of threads, has fully asynchronous combinators, which maintain
the throughput of the system, and terminates speculative work by recursively
poisoning dependencies and terminating computations that are not needed.
An approach to increase parallelism is to create parallel versions of existing
collections. For instance, Haskell [10] adds parallel operations to its collections,
and the Scala parallel collections [18] adds new methods to their collection, par
and seq, that return a parallel and a sequential version of the collection. However
these approaches cannot coordinate complex workflows, which is possible with
the ParT abstraction.
2Encore ParT prototype: http://52.50.101.143/kompile/encore/
3Clojure ParT prototype: https://github.com/kikofernandez/ParT
Recent approaches to creating pipeline parallelism are the Flowpool [19] and
FlumeJava [4] abstractions. In the former, functions are attached to Flowpool
and, with the foreach combinator, the attached functions are applied to items
asynchronously added to the Flowpool thereby creating parallel pipelines of com-
putations. The latter, FlumeJava, is a library extending the MapReduce frame-
work; it provides high-level constructs to create efficient data-parallel pipelines
of MapReduce jobs, via an optimisation phase. The ParT abstraction can create
data-parallel pipelines with the sequence and bind = combinators (at the
moment there is no optimisation phase) and further can terminate speculative
work.
Existing approaches to safely terminating speculative parallelism [6,9,17] did
not integrate well with the ParT abstraction. For instance, the Cilk program-
ming language provides the abort keyword to terminate all speculative work
generated by a procedure [6]. The termination does not happen immediately, in-
stead, computations are marked as not-runnable; already running computations
would get marked as non-runnable but do not stop execution until their work
is finished. In other approaches, the developer specifies termination checkpoints
at which a task may be terminated [9, 17]. This solves the previous problem
and improves responsiveness but, adds an extra overhead (for the checking) and
puts the responsibility on the developer, who specifies the location of the check-
points. In our design, the developer does not need to specify these checkpoints
and speculative work is terminated as soon as there are no dependencies. No
other approach considers that the results of tasks may be needed elsewhere.
5 Conclusion and Future Work
This paper presented the ParT asynchronous, parallel collection abstraction,
and a collection of combinators that operate over it. ParT was formalised as a
typed calculus of tasks, futures and Orc-like combinators. A primary character-
istic of the calculus is that it captures the non-blocking implementation of the
combinators, including an algorithm for pruning that tracks down dependencies
and is safe with respect to shared futures. The ParT abstraction has prototypes
in the Encore (statically typed) and Clojure (dynamically typed) programming
languages.
Currently, the calculus does not support side-effects. These are challenging to
deal with, due to potential race conditions and terminated computations leav-
ing objects in an inconsistent state. We expect that Encore’s capability type
system [2] can be used to avoid data races, and a run-time, transactional mecha-
nism can deal with the inconsistent state. At the start of the paper we mentioned
that ParT was integrated into an actor-based language, but the formalism in-
cluded no actors. This work abstracted away the actors, replacing them by tasks
and futures—message sends in the Encore programming language return results
via futures—which were crucial for tying together the asynchronous computa-
tions underlying a ParT. Actors can easily be re-added as soon as the issues of
shared mutable state have been addressed. The distribution aspect of actors has
not yet been considered in Encore or in the ParT abstraction. This would be
an interesting topic for future work. Beyond these extensions, we also plan to
extend the range of combinators supporting the ParT abstraction.
References
1. Musab AlTurki and Jos´e Meseguer. Dist-Orc: A rewriting-based distributed im-
plementation of Orc with formal analysis. In Peter Csaba ¨
Olveczky, editor, Pro-
ceedings First International Workshop on Rewriting Techniques for Real-Time Sys-
tems, RTRTS 2010, Longyearbyen, Norway, April 6-9, 2010., volume 36 of EPTCS,
pages 26–45, 2010.
2. Stephan Brandauer, Elias Castegren, Dave Clarke, Kiko Fernandez-Reyes,
Einar Broch Johnsen, Ka I. Pun, Silvia Lizeth Tapia Tarifa, Tobias Wrigstad,
and Albert Mingkun Yang. Parallel objects for multicores: A glimpse at the paral-
lel language Encore. In Marco Bernardo and Einar Broch Johnsen, editors, Formal
Methods for Multicore Programming - 15th International School on Formal Methods
for the Design of Computer, Communication, and Software Systems, SFM 2015,
Bertinoro, Italy, June 15-19, 2015, Advanced Lectures, volume 9104 of Lecture
Notes in Computer Science, pages 1–56. Springer, 2015.
3. Marco Devesas Campos and Lu´ıs Soares Barbosa. Implementation of an orchestra-
tion language as a Haskell domain specific language. Electr. Notes Theor. Comput.
Sci., 255:45–64, 2009.
4. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R.
Henry, Robert Bradshaw, and Nathan Weizenbaum. Flumejava: Easy, efficient
data-parallel pipelines. In Proceedings of the 31st ACM SIGPLAN Conference
on Programming Language Design and Implementation, PLDI ’10, pages 363–375,
New York, NY, USA, 2010. ACM.
5. Dave Clarke and Tobias Wrigstad. Vats: A safe, reactive storage abstraction.
In Erika ´
Abrah´am, Marcello M. Bonsangue, and Einar Broch Johnsen, editors,
Theory and Practice of Formal Methods – Essays Dedicated to Frank de Boer on
the Occasion of His 60th Birthday, volume 9660 of Lecture Notes in Computer
Science, pages 140–154. Springer, 2016.
6. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of
the Cilk-5 multithreaded language. In Jack W. Davidson, Keith D. Cooper, and
A. Michael Berman, editors, Proceedings of the ACM SIGPLAN ’98 Conference on
Programming Language Design and Implementation (PLDI), Montreal, Canada,
June 17-19, 1998, pages 212–223. ACM, 1998.
7. Robert H. Halstead, Jr. Multilisp: A language for concurrent symbolic computa-
tion. ACM Trans. Program. Lang. Syst., 7(4):501–538, October 1985.
8. Rich Hickey. The Clojure programming language. In Johan Brichau, editor, Pro-
ceedings of the 2008 Symposium on Dynamic Languages, DLS 2008, July 8, 2008,
Paphos, Cyprus, page 1. ACM, 2008.
9. Shams Imam and Vivek Sarkar. The Eureka programming model for speculative
task parallelism. In John Tang Boyland, editor, 29th European Conference on
Object-Oriented Programming, ECOOP 2015, July 5-10, 2015, Prague, Czech Re-
public, volume 37 of LIPIcs, pages 421–444. Schloss Dagstuhl - Leibniz-Zentrum
fuer Informatik, 2015.
10. Simon L. Peyton Jones. Harnessing the multicores: Nested data parallelism in
Haskell. In G. Ramalingam, editor, Programming Languages and Systems, 6th
Asian Symposium, APLAS 2008, Bangalore, India, December 9-11, 2008. Proceed-
ings, volume 5356 of Lecture Notes in Computer Science, page 138. Springer, 2008.
11. David Kitchin, William R. Cook, and Jayadev Misra. A language for task or-
chestration and its semantic properties. In Proceedings of the 17th International
Conference on Concurrency Theory, CONCUR’06, pages 477–491, Berlin, Heidel-
berg, 2006. Springer-Verlag.
12. David Kitchin, Adrian Quark, William Cook, and Jayadev Misra. The Orc pro-
gramming language. In Proceedings of the Joint 11th IFIP WG 6.1 International
Conference FMOODS ’09 and 29th IFIP WG 6.1 International Conference FORTE
’09 on Formal Techniques for Distributed Systems, FMOODS ’09/FORTE ’09,
pages 1–25, Berlin, Heidelberg, 2009. Springer-Verlag.
13. John Launchbury and Trevor Elliott. Concurrent orchestration in Haskell. In
Jeremy Gibbons, editor, Proceedings of the 3rd ACM SIGPLAN Symposium on
Haskell, Haskell 2010, Baltimore, MD, USA, 30 September 2010, pages 79–90.
ACM, 2010.
14. Narciso Mart´ı-Oliet and Jos´e Meseguer. Rewriting logic: roadmap and bibliogra-
phy. Theor. Comput. Sci., 285(2):121–154, 2002.
15. Daniel McCain. Parallel combinators for the Encore programming language. Mas-
ter’s thesis, Uppsala University, 2016.
16. Martin Odersky, Lex Spoon, and Bill Venners. Programming in Scala: A Compre-
hensive Step-by-step Guide. Artima Incorporation, USA, 1st edition, 2008.
17. Tim Peierls, Brian Goetz, Joshua Bloch, Joseph Bowbeer, Doug Lea, and David
Holmes. Java Concurrency in Practice. Addison-Wesley Professional, 2005.
18. Aleksandar Prokopec, Phil Bagwell, Tiark Rompf, and Martin Odersky. A generic
parallel collection framework. In Emmanuel Jeannot, Raymond Namyst, and Jean
Roman, editors, Euro-Par 2011 Parallel Processing - 17th International Confer-
ence, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceed-
ings, Part II, volume 6853 of Lecture Notes in Computer Science, pages 136–147.
Springer, 2011.
19. Aleksandar Prokopec, Heather Miller, Tobias Schlatter, Philipp Haller, and Martin
Odersky. Flowpools: A lock-free deterministic concurrent dataflow abstraction. In
Hironori Kasahara and Keiji Kimura, editors, Languages and Compilers for Parallel
Computing, 25th International Workshop, LCPC 2012, Tokyo, Japan, September
11-13, 2012, Revised Selected Papers, volume 7760 of Lecture Notes in Computer
Science, pages 158–173. Springer, 2012.
20. Don Syme, Tomas Petricek, and Dmitry Lomov. The F# asynchronous program-
ming model. In Ricardo Rocha and John Launchbury, editors, Practical Aspects of
Declarative Languages - 13th International Symposium, PADL 2011, Austin, TX,
USA, January 24-25, 2011. Proceedings, volume 6539 of Lecture Notes in Computer
Science, pages 175–189. Springer, 2011.
21. Andrew K. Wright and Matthias Felleisen. A syntactic approach to type soundness.
Inf. Comput., 115(1):38–94, 1994.