“Carbon Credits” for ResourceBounded Computations Using Amortised Analysis
ABSTRACT Bounding resource usage is important for a number of areas, notably realtime embedded systems and safetycritical systems.
In this paper, we present a fully automatic static typebased analysis for inferring upper bounds on resource usage for programs
involving general algebraic datatypes and full recursion. Our method can easily be used to bound any countable resource, without
needing to revisit proofs. We apply the analysis to the important metrics of worstcase execution time, stack and heapspace
usage. Our results from several realistic embedded control applications demonstrate good matches between our inferred bounds
and measured worstcase costs for heap and stack usage. For time usage we infer good bounds for one application. Where we
obtain less tight bounds, this is due to the use of software floatingpoint libraries.

Conference Paper: Multivariate amortized resource analysis.
[Show abstract] [Hide abstract]
ABSTRACT: We study the problem of automatically analyzing the worstcase resource usage of procedures with several arguments. Existing automatic analyses based on amortization, or sized types bound the resource usage or result size of such a procedure by a sum of unary functions of the sizes of the arguments. In this paper we generalize this to arbitrary multivariate polynomial functions thus allowing bounds of the form mn which had to be grossly overestimated by m2+n2 before. Our framework even encompasses bounds like ∗i,j≤n m_i mj where the mi are the sizes of the entries of a list of length n. This allows us for the first time to derive useful resource bounds for operations on matrices that are represented as lists of lists and to considerably improve bounds on other superlinear operations on lists such as longest common subsequence and removal of duplicates from lists of lists. Furthermore, resource bounds are now closed under composition which improves accuracy of the analysis of composed programs when some or all of the components exhibit superlinear resource or size behavior. The analysis is based on a novel multivariate amortized resource analysis. We present it in form of a type system for a simple firstorder functional language with lists and trees, prove soundness, and describe automatic type inference based on linear programming. We have experimentally validated the automatic analysis on a wide range of examples from functional programming with lists and trees. The obtained bounds were compared with actual resource consumption. All bounds were asymptotically tight, and the constants were close or even identical to the optimal ones.Proceedings of the 38th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 2628, 2011; 01/2011  SourceAvailable from: www2.tcs.ifi.lmu.de[Show abstract] [Hide abstract]
ABSTRACT: This paper describes the first successful attempt, of which we are aware, to define an automatic, typebased static analysis of resource bounds for lazy functional programs. Our analysis uses the automatic amortisation approach developed by Hofmann and Jost, which was previously restricted to eager evaluation. In this paper, we extend this work to a lazy setting by capturing the costs of unevaluated expressions in type annotations and by amortising the payment of these costs using a notion of lazy potential. We present our analysis as a proof system for predicting heap allocations of a minimal functional language (including higherorder functions and recursive data types) and define a formal cost model based on Launchbury's natural semantics for lazy evaluation. We prove the soundness of our analysis with respect to the cost model. Our approach is illustrated by a number of representative and nontrivial examples that have been analysed using a prototype implementation of our analysis.01/2012; 
Conference Paper: A runtime approach for estimating resource usage
[Show abstract] [Hide abstract]
ABSTRACT: In the era of information explosion, a program is necessary to be scalable. Therefore, scalability analysis becomes very important in software verification and validation. However, current approaches to empirical scalability analysis remain limitations related to the number of supported models and performance. In this paper, we propose a runtime approach for estimating the program resource usage with two aims: evaluating the program scalability and revealing potential errors. In this approach, the resource usage of a program is first observed when it is executed on inputs with different scales, the observed results are then fitted on a model of the usage according to the program's input. Comparing to other approaches, ours supports diverse models to illustrate the resource usage, i.e., linearlog, powerlaw, polynomial, etc. We currently focus on the computation cost and stack frames usage as two representatives of resource usage, but the approach can be extended to other kinds of resource. The experimental result shows that our approach achieves more precise estimation and better performance than other stateoftheart approaches.Proceedings of the Fourth Symposium on Information and Communication Technology; 12/2013
Page 1
“Carbon Credits” for ResourceBounded
Computations using Amortised Analysis
Steffen Jost1, HansWolfgang Loidl2, Kevin Hammond1,
Norman Scaife3, and Martin Hofmann2
1St Andrews University, St Andrews, Scotland, UK
jost,kh@cs.standrews.ac.uk
2LudwigMaximilians University, Munich, Germany
hwloidl,mhofmann@tcs.ifi.lmu.de
3Universit´ e BlaisePascal, ClermontFerrand, France
Norman.Scaife@univbpclermont.fr
Abstract. Bounding resource usage is important for a number of ar
eas, notably realtime embedded systems and safetycritical systems. In
this paper, we present a fully automatic static typebased analysis for
inferring upper bounds on resource usage for programs involving general
algebraic datatypes and full recursion. Our method can easily be used
to bound any countable resource, without needing to revisit proofs. We
apply the analysis to the important metrics of worstcase execution time,
stack and heapspace usage. Our results from several realistic embed
ded control applications demonstrate good matches between our inferred
bounds and measured worstcase costs for heap and stack usage. For time
usage we infer good bounds for one application. Where we obtain less
tight bounds, this is due to the use of software floatingpoint libraries.
1Introduction
Programs often produce undesirable “emissions”, such as littering the memory
with garbage. Our work is aimed at predicting limits on such emissions in advance
of execution. “Emissions” here refer to any quantifiable resource that is used by
the program. In this paper, we will focus on the key resources of worstcase
execution time, heap allocations, and stack usage. Predicting emissions limits is
clearly desirable in general, and can be vital in safetycritical, embedded systems.
Our method can be explained by analogy to an attempted countermeasure to
global warming: some governments are attempting to reduce industrial pollution
by issuing tradable carbon credits. The law then dictates that each CO2emission
must be offset by expending an appropriate number of carbon credits. It follows
that the total amount of emissions is a priori bounded by the number of car
bon credits that have been previously issued by the authorities. Following this
analogy, we will similarly issue credits for computer programs. The “emissions”
of each program operation must then be immediately justified by spending a
corresponding number of credits. The use of “carbon credits” for software anal
ysis does, however, have several advantages over the political situation: i) we can
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 2
prove that each and every emission that occurs is legitimate and that it has been
properly paid for by spending credits; ii) we have zero bureaucratic overhead,
since we use an efficient compiletime analysis, there need be no modifications
whatever to the original program, and we therefore do not change actual execu
tion costs; and iii) we provide an automatic static analysis that, when successful,
provides a guaranteed upper bound on the number of credits that must be issued
initially to ensure that a program can run to completion, rather than using a
heuristic to determine the requirements. The amount of credits a program is
allowed to spend is specified as part of its type. This allows the absolute number
of credits to vary in relation to the actual input, as shown below.
Example: Tree Processing. Consider a treeprocessing function mill, whose argu
ment has been determined by our analysis to have type tree(Node?7?  Leaf?0.5?).
Given this type, we can determine that processing the first tree below requires
at most 23 = ?23.5? credits†: 7 credits per node and 0.5 credits for each leaf
reference; and that processing either of the other trees requires at most ?15.5?
credits, regardless of aliasing.
z
??
??
??
y
????
xw
??
??
vu
t
s
a
??
??
b
c
??
??
d
e
a
??
??
c
?? ??
e
In fact, the type given by our analysis allows us to easily determine an upper
bound on the cost of mill for any input tree. For example, for a tree of 27 nodes
and 75 leaves, we can compute the credit quota from the type as 7·27+0.5·75 =
226.5, without needing to consider the actual node or leaf values. The crucial
point is that while we are analysing mill, our analysis only needs to keep track
of this single number. Indeed, the entire dynamic state of the program at any
time during its execution could be abstracted into such a number, representing
the total unspent credits at that point in its execution. Because the number of
credits must always be nonnegative, this then establishes an upper bound on
the total future execution costs (time or space, etc.) of the program. Note that
since this includes the cost incurred by all subsequent function calls, recursive or
otherwise, it follows that our analysis will also deal with outsourced emissions.
Novel contributions made by this paper: We present a fully automatic compile
time analysis for inferring upper bounds on generic program execution costs, in
the form of a new resourceaware type system. The underlying principle used
in our automatic analysis is a modified version of Tarjan’s amortised cost anal
ysis [17], as previously applied to heap allocation by Hofmann and Jost [11].
We prove that the annotated type of terms describes its maximal resource re
quirement with respect to a given operational semantics. Our analysis becomes
automatic by providing type inference for this system and solving any constraints
that are generated by using an external linear programming solver.
†Note while only whole credits may be spent, fractional credits can be accumulated.
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 3
Moreover, we extend previous work:
a) by dealing with arbitrary (recursive) algebraic datatypes;
b) by providing a unified generic approach that presents a soundness proof that
holds for arbitrary cost metrics and for many different operational models;
c) by applying the approach to realworld examples, notably worstcase execu
tion time on the Renesas M32C/85U processor.
Section 2 introduces a simple functional language that exhibits our analysis.
We consider the soundness of our analysis in Section 5, discuss several example
programs in Section 6 and cover related work in Section 7. Section 8 concludes.
2The Schopenhauer Notation
We illustrate our approach using a simple, strict, purely functional programming
language Schopenhauer (named after the German philosopher), which includes
recursive datatypes and full recursion, and which is intended as a simple core
language for richer notations, such as our own Hume language [9]. Schopenhauer
programs comprise a set of one or more (possibly mutually recursive) function
declarations. For simplicity, functions and datatypes are monomorphic (we are
currently investigating the extension to polymorphic definitions).
prog
vars
expr
::=
::=
::=



varid1 vars1 = expr1; ... ; varidn varsn = exprn
? varid1 , ... , varidn?
const  varid  varid vars  conid vars
case
varid of conid vars > expr1 expr2
let varid = expr1in expr2
LET varid = expr1IN expr2
n ≥ 1
n ≥ 0
The Schopenhauer syntax is fairly conventional, except that: i) we distinguish
variable and constructor identifiers; ii) pattern matches are not nested and only
allow two branches; iii) we have two forms of letexpression; and iv) function
calls are in letnormal form, i.e. arguments are always simple variables. The
latter restriction is purely for convenience, since it simplifies the construction of
our soundness proof in Section 5 by removing some tedious redundancies. There
is no drawback to this since Schopenhauer features two kinds of letexpressions,
let and LET, the former appearing in source programs, and the latter introduced
as a result of internal translation. Both forms have identical semantics but they
may have differing operational costs, depending on the desired operational model
and on the translation into letnormal form. Since the ordinary letexpression
usually incurs some overhead for managing the created reference, it cannot be
used to transform expressions into letnormal form in a costpreserving manner.
3Schopenhauer Operational Semantics
Our operational semantics (Figure 1) is fairly standard, using a program sig
nature Σ to map function identifiers to their defining bodies. The interesting
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 4
feature of our semantics is that it is instrumented by a (nonnegative) resource
counter, which defines the cost of each operation. This counter is intended to
measure execution costs, with the execution being stuck if the counter becomes
negative. We will prove later that our analysis determines an upper bound on
the smallest starting value for this counter, and so prevents this from happening.
An environment, V, is a mapping from variables to locations, denoted by ?.
A heap, H, is a partial map from locations to values w. H[? ?→ w] denotes a
heap that maps ? to value w and otherwise acts as H. Values are simple tuples
whose first component is a flag that indicates the kind of the value, e.g. (bool,tt)
for the boolean constant true, (int,42) for the integer 42, etc. The judgement
V,H
n
n? e ; ?,H?
then means that under the initial environment V and heap H, the expression
e evaluates to location ? (all values are boxed) and postheap H?, provided
at least n units of the selected resource are available before the computation.
Furthermore, n?units are available after the computation. Hence, for example,
V,H
ing e, and that exactly one is unused after the computation. This one unit might,
or might not, have been used temporarily. We will simply write V,H ? e ; ?,H?
if there exists n,n?such that V,H
3
1e ; ?,H?simply means that 3 resource units are sufficient for evaluat
n
n? e ; ?,H?.
Cost Parameters. The operational rules involve a number of constants which
serve as parameters for an arbitrary cost model. For example, the constant
KmkInt denotes the cost for an integer constant. If an integer occupies two heap
units, and we are interested in heap usage, we set this constant to two; if each
pointer occupies a single stack unit, and we are interested in stack usage, we set
this value to one; and so on. Some cost parameters are parametrised to allow
better precision to be obtained, e.g. for execution time, the cost of matching a
constructor may vary according to the number of arguments it has.
It is important to note that our soundness proof does not rely on any spe
cific values for these constants. Any suitable values may be used according to
the required operational cost model. While it would be possible to expand the
cost parameters to vectors, in order to deal with several simultaneous metrics,
for example, this would require similar vector annotations in our type systems,
requiring a high notational overhead, without making a new contribution.
4Schopenhauer Type Rules
The annotated types of Schopenhauer are given by the following grammar:
T ::= int  X  µX.{ c1:(q1,− →
where X is a type variable, ci∈ Constrs are constructor labels; p,p?,qiare either
nonnegative rational constants or resource variables belonging to the infinite
T1)...ck:(qk,− →
Tk) } − →
T− →
p
p? T?
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 5
n ∈ Z
q?+ KmkInt
q?
? / ∈ dom(H)
n ; ?,Hˆ? ?→ (int,n)˜
Σ(fid) =`ef;y1,...,yk;C;ψ´
[y1?→V(x1),...,yk?→V(xk)],H
V,H
V,H
V(x) = ?
q?+ KpushVar
q?
V,H
x ; ?,H
q − Kcall(k)
q?+ Kcall?(k)ef;?,H?
q
q? fid ?x1,··· ,xk? ; ?,H?
c ∈ Constrs
? / ∈ dom(H)
V,H
k ≥ 0
c ?x1,...,xk? ; ?,H[? ?→ w]
w =`constrc,V(x1),...,V(xk)´
q?+ KCons(k)
q?
H(k ) =`c,k1,...,kk
´
V[y1 ?→ k1,...,yk?→ kk],H
q? case x of c ?y1,...,yk? > e1e2 ; ?,H?
H`V(x)´?=`c,k1,...,kk
V,H
q − KCaseT(k)
q?+ KCaseT?(k)e1 ; ?,H?
V[x ?→ k ],H
q
´
V,H
q − KCaseF(k)
q?+ KCaseF?(k)e2 ; ?,H?
q
q? case x of c ?y1,...,yk? > e1e2 ; ?,H?
V,H
q1− KLet1
q2
e1 ; ?1,H1
V[x ?→ ?1],H1
let x = e1 in e2 ; ?2,H2
q2− KLet2
q?+ KLet3 e2 ; ?2,H2
V,H
q1
q?
Note that the rule for LET ... IN is identical to that for let ... in above, except in
replacing constants KLet1, KLet2 and KLet3 with KLET1, KLET2 and KLET3, respectively.
Fig.1. Schopenhauer Operational Semantics
set of resource variables CV ranging over Q+; and we write− →
where n ≥ 0. For convenience, we extend all operators pointwise when used in
conjunction with the vector notation i.e.− →
ψ,φ,ξ range over sets of linear inequalities over resource variables. We write
ψ ⇒ φ to denote that ψ entails φ, i.e. all valuations v :CV → Q+which satisfy
ψ also satisfy all constraints in φ. We write v ⇒ φ if the valuations satisfies
all constraints. We extend valuations to types and type contexts in the obvious
way. Valuations using nonnegative real numbers are permissible, but rational
annotations are of most interest since they allow the use of inplace update, as
described in [11].
Algebraic datatypes are defined as usual, except that the type carries a re
source variable for each constructor. The type rules for Schopenhauer then gov
ern how credits are associated with runtime values of an annotated type. The
number of credits associated with a runtime value w of type A is denoted by
Φv
all constructor nodes reachable from w, where the weight of each constructor in
the sum is determined by the type A. As we have seen in the tree/mill example
in the introduction, a single constructor node may contribute many times to this
T for ? T1 ... Tn?
A =− →
B stands for ∀i.Ai = Bi. Let
H(w : A), formalised in Definition 2 in Section 5. Intuitively, it is the sum over
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 6
sum, possibly each time with a different weight, determined by the type of the
reference used to access it. While this definition is paramount to our soundness
proof, any practical application only requires the computation of this number
for the initial memory configuration, for which it can always be easily computed.
It is easy to see, for example, that the number of credits associated with a list
of integers having the type µX.{Nil : (z0,??)Cons : (z,?int,X?)} is simply
z0+n·z, where n is the length of the list. We naturally extend this definition to
environments and type contexts by summation over the domain of the context.
We can now formulate the type rules for Schopenhauer (which are standard
apart from the references to cost and resource variables). Let Γ denote a typing
context mapping identifiers to annotated Schopenhauer types. The Schopen
hauer typing judgement Γ
satisfy all constraints in φ, the expression e has Schopenhauer type v(A) under
context v(Γ); moreover evaluating e under environment V and heap H requires
at most v(q) + Φv
available afterwards”. The types thus bound resource usage and we will formalise
the above statement as our main theorem (Theorem 1), which requires as a pre
condition that the context, environment and heap are all mutually consistent.
A Schopenhauer program is a mapping Σ, called the signature of the pro
gram, which maps function identifiers fid belonging to the set Var to a quadruple
consisting of: i) a term defining the function’s body; ii) an ordered list of argu
ment variables; iii) a type; and iv) a set of constraints involving the annotations
of the type. Since the signature Σ is fixed for each program to be analysed,
for simplicity, we omit it from the premises of each type rule. A Schopenhauer
program is welltyped if and only if for each identifier fid
q
q? e : A  φ then reads “for all valuations v that
H(V : Γ) credits and leaves at least v(q?) + Φv
H(V : Γ) credits
Σ(fid) = (efid;y1,...,ya;?A1,...,Aa?− →
p
p? C;ψ)=⇒
y1:A1,...,ya:Aa
p − Kcall(a)
p?+ Kcall?(a)efid: C  ψ
Basic Expressions. Primitive terms have fixed costs. Requiring all available cred
its to be spent simplifies proofs, without imposing any restrictions, since a sub
structural rule allows costs to be relaxed where required.
n ∈ Z
∅
KmkInt
0
n : int  ∅
(Int)
x:A
KpushVar
0
x : A  ∅
(Var)
Function Call. The cost of function application is represented by the constants
Kcall(k) and Kcall?(k), which specify, respectively, the absolute costs of setting
up before the call and clearing up after the call. In addition, each argument may
carry further credits, depending on its type, to pay for the function’s execution.
For simplicity, we have prohibited zeroarity function calls.
Σ(fid) =?efid;y1,...,yk;?A1,...,Ak?− →
y1:A1,...,yk:Ak
p
p? C;ψ?
k ≥ 1
p + Kcall(k)
p?−Kcall?(k)fid ?y1,··· ,yk?:C  ψ
(App)
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 7
Algebraic Datatypes. The Constr rule plays a crucial role in our annotated
type system, since this is where available credits may be associated with a new
data structure. Credits cannot be used while they are associated with data.
C = µX.{···c : (p,?B1,...,Bk?)···}
Ai= Bi
p + KCons(k)
0
The dual to the above rule is the Case rule; the only point where credits as
sociated with data can be released again. This is because this is the only point
where we know about the actual constructor that is referenced by a variable, i.e.
where we know whether a variable of a list type refers to a nonempty list, etc.
c ∈ Constrs
?C?X?(for i = 1,...,k)
x1:A1,...,xk:Ak
c ?x1,...,xk? : C  ∅
(Constr)
c ∈ Constrs
A = µX.{···c : (p,?B1,...,Bk?)···}
ψ =
q = qf+ KCaseF(k)
Γ,x:A
Γ,y1:B1[A/X],...,yk:Bk[A/X]
qt
q?
te1: C  ψt
qf
q?
f
Γ,x:A
e2: C  ψf
?
?
p + q = qt+ KCaseT(k)
q?
q?
t= q?+ KCaseT?(k)
f= q?+ KCaseF?(k)
q
q? case x of c ?y1,...yk? > e1e2: C  ψ ∪ ψt∪ ψf
(Case)
Letbindings. The two rules for letexpressions are the only ones that thread
credits sequentially through the subrules. As in the operational semantics rules,
the type rule for LET ... IN is identical to that below, except in replacing KLet1,
KLet2, KLet3 with KLET1, KLET2, KLET3, respectively. Note that we use a comma
for the disjoint union of contexts throughout, hence duplicated uses of variables
must be introduced through the Share rule, described in the next paragraph.
Γ
q1
q?
1e1: A1  ψ1
∆,x:A1
1− KLet2
q2
q?
2e2: A2  ψ2
q?= q?
ψ0=?q1= q − KLet1
Γ,∆
q2= q?
2− KLet3
?
q
q?
let x = e1in e2 : A2  ψ0∪ ψ1∪ ψ2
(Let)
Substructural rules.
simplifying proofs, the Share rule makes multiple uses of a variable explicit.
Unlike in a strictly linear type system, variables can be used several times.
However, the types of all occurrences must “add up” in such a way that the
total credit associated with all occurrences is no larger than the credit initially
associated with the variable. It is the job of the Share rule to track multiple
occurrences, and it is the job of the ?function to apportion credits.
Γ,x:Bψ ? A<:B
Γ,x:A
(Supertype)
We use explicit substructural type rules. Apart from
q
q? e : C  φ
q
q? e : C  φ ∪ ψ
Γ
q
q? e : D  φ
Γ
ψ ? D<:C
q
q? e : C  φ ∪ ψ
(Subtype)
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 8
Γ
p
p? e : A  ψ
φ ⇒ ψ ∪ {q ≥ p,q − p ≥ q?− p?}
Γ
q
q? e : A  φ
(Relax)
Γ
q
q? e : C  ψ
Γ,x:A
q
q? e : C  φ(Weak)
Γ,x:A1,y:A2
q? e[z/x,z/y] : C  φ ∪ ?(AA1,A2)
q
q? e : C  φ
Γ,z:A
q
(Share)
The ternary function ?(AB,C) is only defined for structurallyidentical type
to the sum of its counterparts in B and C. The crucial property of this function
is expressed in Lemma 4. For example,
A = µX.{Nil:(a,??)Cons:(d,?int,X?)} B = µX.{Nil:(b,??)Cons:(e,?int,X?)}
C = µX.{Nil:(c,??)Cons:(f,?int,X?)} ?(AB,C) = {a = b + c,d = e + f}
relation ξ ? A<:B between two types A and B, relative to constraint set ξ. For
any fixed constraint set ξ, the relation is both reflexive and transitive.
for all i holds ξ ⇒ {pi≥ qi} and ξ ?− →
ξ ? µX.{···ci:(pi,− →
ξ ⇒?p ≤ q , p?≥ q??
A− →
The inference itself follows straightforwardly from these type rules. First, a stan
dard typing derivation is constructed, and each type occurrence is annotated
with fresh resource variables. The standard typing derivation is then traversed
once to gather all the constraints. Since we found this easier to implement, sub
structural rules have been amalgamated with the other typing rules. Because all
types have been annotated with fresh resource variables, subtyping is required
throughout. Subtyping simply generates the necessary inequalities between cor
responding resource variables, and will always succeed, since it is only permitted
between types that differ at most in their resource annotations. Likewise, the
Relax rule is applied at each step, using the minimal constraints shown in the
rule. (However, inequalities are turned into equalities using explicit slack vari
ables, in order to minimise wasted credits.) The Weak and Share rule are
applied based on the free variables of the subterms.
In the final step, the constraints that have been gathered are fed to an LP
solver [2]. Any solution that is found is presented to the user in the form of an
annotated type and a humanreadable closed cost formula. In practice, we have
found that these constraints can be easily solved by a standard LPsolver running
on a typical laptop or desktop computer, partly because of their structure [11].
Since only a single pass over the program code is needed to construct these
constraints, this leads to a highly efficient analysis.
triples which differ in at most the names of resource variables. It returns a set
of constraints that enforce the property that each resource variable in A is equal
Subtyping. The type rules for subtyping depend on another inductively defined
ξ ? A<:A
Ai<:− →
Bi
Ai)···}<:µX.{···ci:(qi,− →
ξ ?− →
p? C <:− →
B− →
Bi)···}
B <:− →
q
q? D
Aξ ? C <:D
ξ ?− →
p
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 9
5Soundness of the Analysis
We now sketch the important steps for proving the main theorem. We first
formalise the notion of a “wellformed” machine state, which simply says that
for each variable, the type assigned by the typing context agrees with the actual
value found in the heap location assigned to that variable by the environment.
This is an essential invariant for our soundness proof.
Definition 1. A memory configuration consisting of heap H and stack V is
wellformed with respect to context Γ and valuation v, written H?vV:Γ, if and
only if H?vV(x):Γ(x) can be derived for all variables x ∈ Γ.
H(?) = (int,n)
n ∈ Z
H?v?:int
H(?) = (constrc,?1,...,?k)
C = µX.{···c : (q,?B1,...,Bk?)···}
∀i ∈ {1,...,k}.H?v?i:Bi
H?v?:C
H?v?:A
∃φ.v ⇐ φ ∧ φ ? A<:B
H?v?:B
?C?X?
Lemma 1. If H?vV:Γ and V,H ? e ; ?,H?then also H??vV:Γ.
We remark that one might wish to prove a stronger statement to the effect that
the result ? of the valuation is also wellformed given that the expression e was
typeable. Unfortunately such a statement cannot be proven on its own and must
necessarily be interwoven in Theorem 1.
We now formally define how credits are associated with runtime values, fol
lowing our intuitive description from the previous section.
Definition 2. If H?v? :A holds, then Φv
associated with location ? for type A in heap H under valuation v. This value is
always zero, except when A is a recursive datatype in which case it is recursively
defined by
Φv
H(?:A) denotes the number of credits
H(?:A) = v(q) +
?
i
Φv
H(?i:Bi[A/X])
when A = µX.{···c:(q,?B1,...,Bk?)···} and H(?) = (constrc,?1,...,?k).
We extend to contexts by Φv
H(V : Γ) =?
x∈dom(Γ)Φv
H
?V(x):v?Γ(x)? ?
Subsumption cannot increase the number of associated credits.
Lemma 2. If H?v?:A and φ ? A<:B holds and v is a valuation satisfying φ,
then Φv
H(?:A) ≥ Φv
H(?:B)
If a reference is duplicated, then the type of each duplicate must be a subtype
of the original type.
Lemma 3. If ?(AB,C) = φ holds then also φ ? A<:B and φ ? A<:C.
shared between the two types introduced by sharing. In other words, the overall
amount of available credits does not increase by using Share.
The number of credits attached to any value of a certain type is always linearly
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 10
Lemma 4. If the judgements H?v?:A and ?(AB,C) = φ hold and v satisfies
We can now formulate the main theorem (described intuitively in Section 4).
the constraint set φ then Φv
and A = C, it follows that Φv
H(?:A) = Φv
H(?:A) = 0 also holds.
H(?:B) + Φv
H(?:C). Moreover, for A = B
Theorem 1 (Soundness). Fix a welltyped Schopenhauer program. Let r ∈ Q+
be fixed, but arbitrary. If the following statements hold
Γ
q
q? e:A  φ
V,H ? e ; ?,H?
v : CV → Q+, satisfying φ
H?vV:v(Γ)
(5.1)
(5.2)
(5.3)
(5.4)
then for all m ∈ N such that
m ≥ v(q) + Φv
H
?V :v(Γ)?+ r
(5.5)
there exists m?∈ N satisfying
V,H
m
m? e ; ?,H?
H???:v(A)?+ r
(5.6)
(5.7)
(5.8)
m?≥ v(q?) + Φv
H??v?:v(A)
The proof is by induction on the lengths of the derivations of (5.2) and (5.1)
ordered lexicographically, with the derivation of the evaluation taking priority
over the typing derivation. This is required since an induction on the length of
the typing derivation alone would fail for the case of function application, which
increases the length of the typing derivation. On the other hand, the length of the
derivation for the term evaluation never increases, but may remain unchanged
where the final step of the typing derivation was obtained by a substructural
rule. In these cases, the length of the typing derivation does decrease, allowing
an induction over lexicographically ordered lengths of both derivations.
The proof is complex, but unsurprising for most rules. The arbitrary value
r is required to “hide” excess credits when applying the induction hypothesis
for subexpressions, which leaves those credits untouched. We show one case to
provide some flavour of the overall proof:
; Case Succeed: By the induction hypothesis, we obtain for all m0≥ v(qt)+
Φv
that e1evaluates under the annotated operational semantics with m0and
m?
v(q) = v(qt) + KCaseT(k) and v(q?
m0+ KCaseT(k) ≥ v(q) +v(p) + Φv
Φv
as required.
H(ki:Bi[A/X]) + Φv
H(V:Γ) + r a suitable m?
0≥ v(q?
iΦv
t) + Φv
H(?:C) + r such
0. Observe that we have Φv
H(k :A) = v(p)+?
H(ki:Bi[A/X]) + Φv
H(V:Γ)+r and m?= m?
H(ki:Bi[A/X]) and v(p)+
t) = v(q?) + KCaseT?(k). Therefore m =
H(V:Γ) + r = v(q) +
0−KCaseT?(k) ≥ val(q?)+Φv
H(k :A)+Φv
H(?:C)+r
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 11
ConstantStack Heap WCET‡
Constant Stack Heap WCET‡
KmkInt
KpushVar
Kcall(k)
Kcall?(k)
KCons(k)
KCaseT(k)
KCaseT?(k)
1
1
4 + k
− (4 + k)
1 − k
k − 1
− k
2
0
0
0
2 + k
0
0
83
39
142
53 + Ret
107 + 54k
301 + 80k
65 + Ret
KCaseF(k)
KCaseF?(k)
KLet1
KLet2
KLet3
KLET1
KLET2
KLET3
0
0
1
0
− 1
0
0
0
0
0
0
0
0
0
0
0
205
56 + Ret
142
0
3 + Ret
0
0
0
Table 1. Table of Resource Constants for Stack, Heap and Time
N = 1
Stack
N = 2
Stack
N = 3
Stack
N = 4
Stack
N = 5
Stack
Heap
Time
Heap
Time
Heap
Time
Heap
Time
Heap
Time
revApp
Analysis
Measured 14
Ratio
flatten
Analysis
Measured 17
Ratio
Table 2. Measurement and Analysis Results for TreeFlattening
14 25 2440 24
24 1762 24
1 1.04 1.39 1 1.08 1.31 1 1.13 1.27 1 1.17
26 3596 34
24 2745 34
27 4752 44
24 3725 44
28 5908 54
24 4707 54
1.26 1 1.21
29 7064
24 5687
1.24
1724 3311 34
24 2484 34
1 1.00 1.33 1 1.03 1.42 1 1.02 1.45 1 1.26
34 6189 51
33 4372 51
44 9067 68
43 6260 68
54 11945 85
43 8148 85
1.47 1 1.49
64 14823
43 10036
1.48
6Example Cost Analysis Results
In this section, we compare the bounds inferred by our analysis with concrete
measurements for one operational model. Heap and stack results were obtained
by instrumenting the generated code. Time measurements were obtained from
unmodified code on a 32MHz Renesas M32C/85U embedded microcontroller
with 32kB RAM. The cost parameters used for this operational model are shown
in Table 1. The time metrics were obtained by applying AbsInt GmbH’s aiT
tool [8] to the compiled code of individual abstract machine instructions.
Our first example is a simple treeflattening function and its auxiliary func
tion, reverseappend. The heap space consumption inferred by our analysis is
encoded as the following annotated type
SCHOPENHAUER typing for HumeHeapBoxed:
0, (tree[Leaf<10>:intNode:#,#]) (2/0)> list[C:int,#N] ,0
which reads, “for a given tree with l leaves, the heap consumption is 10l +
2.” Table 2 compares analysis and measurement results. As test input, we use
‡Returns are performed through a fixed size table. On the Renesas M32C/85U this is
compiled to a series of branches and the WCET therefore depends on the number of
calling points in the program. We have Ret = max`116,51+15·(#ReturnLabels)´.
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 12
balanced trees with N = 1...5 leaves. Heap prediction is an exact match for the
measured results. Stack prediction follows a linear bound overestimating actual
costs, which are logarithmic in general, using a tailrecursive reverse function.
This linear bound is due to the design of our analysis, which cannot infer re
use of stack space in all cases (Campbell has described an extension to our
approach [4] that may improve this). The predicted time costs are between 33%
and 48% higher than the measured worstcases.
6.1Control Application: Inverted Pendulum
Our next example is an inverted pendulum controller. This implements a simple,
realtime control engineering problem. A pendulum is hinged upright at the end
of a rotating arm. Both rotary joints are equipped with angular sensors, which
are the inputs for the controller (arm angle θ and pendulum angle α). The
controller should produce as its output the electric potential for the motor that
rotates the arm in such a way that the pendulum remains in an upright position.
The Hume code comprises about 180 lines of code, which are translated into
about 800 lines of Schopenhauer code for analysis. The results for heap and stack
usage (upper part of Table 3) show exact matches in both cases. For time we
have measured the bestcase (36118), worstcase (47635) and average number of
clock cycles (42222) required to process the controlling loop over 6000 iterations
during an actual run, where the Renesas M32C/85U actually controlled the
inverted pendulum. Compared to the worstcase execution time (WCET) bound
given by our automated analysis (63678) we have a margin of 33.7% between the
predicted WCET and the worst measured run. The hard realtime constraint on
this application is that the pendulum controller can only be made stable with a
loop time of less than about 10ms. The measured loop time is 1.488ms, while our
predicted loop time would be 1.989ms, showing that our controller is guaranteed
to be fast enough to successfully control the pendulum under all circumstances.
6.2Control Application: Biquadratic Filter
Our final control application is a secondorder recursive linear filter, a biquadratic
filter, so named because the transfer function is a ratio of two quadratic polyno
mials. It is commonly used in audio work and can be used to implement lowpass,
highpass, bandpass and notch filters. Wellknown algorithms exist for comput
ing filter coefficients from the desired gain, centre frequency and sample rate [14].
The lower part of Table 3 compares analysis results against measured costs
for the components of the biquadratic filter. For heap, we obtain exact bounds for
all but one box. For stack, we find a close match of the bounds with the measured
values (within 12% units). For time, however, the bounds are significantly worse
than the measured values. This is mainly due to the heavy use of floatingpoint
operations in this application, which are implemented in software on the Renesas
M32C/85U. This means that the WCET bounds for the primitive operations in
the analysis cost table are already very slack.
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 13
BoxAnalysisMeasuredRatio
Heap Stack Time Heap Stack Time Heap Stack Time
pendulum
control
biquad
biquad
compute filter
compute params
scale in
scale out
299 93 6367829993 47635 1.00 1.00 1.34
33
73
40
14
33
32 10330
62 26392
38 47307
15 3919
33 16044
33
73
40
10
33
32 5848 1.00
59 13176 1.00
34 16107 1.00
15 1844 1.40
33 5920 1.00
1.00 1.77
1.05 2.00
1.12 2.94
1.00 2.13
1.00 2.71
Table 3. Comparison of Results for Pendulum and Biquad Filter Applications
The critical path through the system comprises the scale in, biquad and
scale out boxes. If we sum their bounds, we obtain a total of 30293 clock cycles,
or 947µs. This gives us a sample rate of about 1.056kHz, obviously well short
of audio sampling rates of about 48kHz. However, this is a concrete guarantee
for the application, and it tells us at an early stage, without any measurement,
that the hardware we are using is not fast enough for realtime audio process
ing. The heap and stack bounds confirm, however, that we can fit static and
dynamic memory on the board. The components are executed sequentially, so
the largest component governs the dynamic memory for the entire system: this
is 73 heap + 62 stack cells for the compute filter box, or a maximum of 540
bytes of memory, well within our design maximum of 32kB.
One distinctive feature of our analysis is that it attributes costs to individual
data type constructors. Therefore, our bounds are not only sizedependent, as
would be expected, but more generally datadependent. For a worstcase execu
tion time analysis of compute filter, we produce the following explanation:
Worstcase Timeunits required to compute box compute_filter once:
359 + 9374*X1 + 16659*X2 + 16123*X3 + 14570*X4
X1 = one if 1. wire is live, zero if the wire is void
X2 = number of "BPF" nodes at 1. position
X3 = number of "HPF" nodes at 1. position
X4 = number of "LPF" nodes at 1. position
where
In particular, since BPF, HPF and LPF are elements of an enumeration type,
selecting a band pass, high pass or low pass filter, respectively, we know that
only one of the three costs attached to these constructors (16659, 16123 or 14570)
will apply. Furthermore, in the case where a null filter is selected, by providing
NULLF as input, none of these three costs applies and the time bound for this case
is, therefore, 9733 clock cycles. Being datadependent, this parametrised bound
is more accurate than the worstcase bound specified in Table 3, where we take
the worstcase over all constructors to derive a value of 26392 clock cycles.
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 14
7Related Work
While there has been significant interest in the use of amortised analysis for
resource usage, in contrast to the work presented in this paper, none of this work
considers multiple resources, none of the work has studied worstcase execution
time, and none of it covers arbitrary recursive data structures. In particular, a
notable difference to Tarjan’s seminal work [17] (in addition to the fact that we
perform automatic inference) is that credits are associated on a perreference
basis and not on the basis of the pure data layout in the memory. Okasaki [15]
resorted to the use of lazy evaluation to solve this problem. In contrast, our
perreference credits can be directly applied to strict evaluation.
Hofmann and Jost were the first to develop an automatic amortised analysis
for heap consumption [11], exploiting a difference metric similar to that used
by Crary and Weirich [7] (the latter, however, only check bounds, and do not
infer them, as we do). Hofmann and Jost have extended their method to cover
a comprehensive subset of Java, including imperative updates, inheritance and
type casts [12]. Shkaravska et al. subsequently considered heap consumption
inference for firstorder polymorphic lists, and are currently studying extensions
to nonlinear bounds [16]. Finally, Campbell [4] has developed the ideas of depth
based and temporary credit uses to give better results for stack usage.
A related idea is that of sized types [13], which express bounds on data
structure sizes, and are attached to types in the same way as our weights. The
difference to our work is that sized types express bounds on the size of the under
lying data structure, whereas our weights are factors of the corresponding sizes,
which may remain unknown. The original work on sized types was limited to
type checking, but subsequent work has developed inference mechanisms [5,18].
A number of authors have recently studied analyses for heap usage. Al
bert et al. [1] present a fully automatic, live heapspace analysis for an object
oriented bytecode language with a scopedmemory manager. Most notably it is
not restricted to a certain complexity class, and produces a closedform upper
bound function over the size of the input. However, unlike our system, data
dependencies cannot be expressed. Braberman et al. [3] infer polynomial bounds
on the live heap usage for a Javalike language with automatic memory manage
ment. However, unlike our system, they do not cover general recursive methods.
Finally, Chin et al. [6] present a heap and a stack analysis for a lowlevel (as
sembler) language with explicit (de)allocation, which is also restricted to linear
bounds.
8 Conclusions and Further Work
By developing a new typebased analysis, we have been able to automatically in
fer linear bounds on realtime, heap and stack costs for strict functional programs
with algebraic datatypes. The use of amortised costs allows us to determine a
provable upper bound on the overall resource cost of running a program, by
attaching numerical annotations to constructors. Thus, our analysis is not just
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com
Page 15
sizedependent but also datadependent. We have extended previous work on
the inference of amortised costs [11] by considering arbitrary (recursive) data
structures and by constructing a generic treatment of resource usage through
our resource tables. In this way, we are able to separate the mechanics of our
approach from the operational semantics that applies to the usage of a given
resource. Previous work [10,11,12,18] has been restricted to the treatment of a
single resource type, and usually also to list homomorphisms. For all programs
studied here, we determine very tight bounds on both heap and stack usage.
Our results show that the bounds we infer for worstcase execution times can be
within 33.7% of the measured costs. However, in some degenerate cases they can
be significantly higher (in some cases due to the use of software floatingpoint
operations whose time behaviour can be difficult to analyse effectively).
We are currently experimenting with a number of further extensions. We
have developed a working prototype implementation dealing with higherorder
functions with flexible cost annotations and partial application (http://www.
embounded.org/software/cost/cost.cgi). The corresponding (and extensive)
theoretical proof is still, however, in preparation. This implementation also deals
with many useful extended language constructs, such as optimised conditionals
for a boolean base type, patternmatches having multiple cases, multiple let
definitions, etc. Most of these extensions are theoretically straightforward, and
in the interest of brevity, we have therefore excluded them from this paper.
We now intend to study how to improve our time results, to determine how
to extend our work to nonlinear bounds, and to determine whether sized types
can be effectively combined with amortised analysis. We are also working to
extend our study of worstcase execution time so that it covers other interest
ing embedded systems architectures, e.g. the Freescale MPC555 for automotive
applications. Since this has a hardware floatingpoint unit, we anticipate that
the issues we have experienced with software floatingpoint operations on the
Renesas M32C/85U will no longer be a concern on this new architecture.
Acknowledgements
We thank Hugo Sim˜ oes and our anonymous reviewers for their useful comments,
and Christoph Herrmann for performing measurements on the Renesas archi
tecture. This work is supported by EU Framework VI grants IST510255 (Em
Bounded) and IST15905 (Mobius); and by EPSRC grant EP/F030657/1 (Islay).
References
1. E. Albert, S. Genaim, and M. G´ omezZamalloa. Live Heap Space Analysis for
Languages with Garbage Collection. In Proc. ISMM 2009: Intl. Symp. on Memory
Management, pages 129–138, Dublin, Ireland, June 2009. ACM.
2. M. Berkelaar, K. Eikland, and P. Notebaert.
integer) linear programming system. GNU LGPL (Lesser General Public Licence).
http://lpsolve.sourceforge.net/5.5.
lp solve: Open source (mixed
Authors version for personal use only. Published at Formal Methods 2009. The original publication is available at www.springerlink.com