# Code-Size Minimization in Multiprocessor Real-Time Systems

**ABSTRACT** Program code size is a critical factor in determining the manufacturing cost of many embedded systems, particularly those aimed at the extremely cost-conscious consumer market. However, the focus of most prior theoretical research on partitioning algorithms for real-time multiprocessor platforms has been on ensuring that the cumulative computing requirements of the tasks assigned to each processor does not exceed the processor's computing capacity. We consider the problem of task partitioning in multiprocessor platforms in order to minimize the total code size, in application systems in which there may be several different implementations of each task available, with each implementation having different code sizes and different computing requirements. We prove that the general problem is intractable, and present polynomial-time algorithms for solving (well-defined) special cases of the general problem.

**0**Bookmarks

**·**

**77**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**The multiprocessor real-time scheduling of general task systems -
- SourceAvailable from: ida.liu.se[Show abstract] [Hide abstract]

**ABSTRACT:**Variants of the 0–1 knapsack problem manifest themselves at the core of several system-level optimization problems. The running times of such system-level optimization techniques are adversely affected because the knapsack problem is NP-hard. In this paper, we propose a new GPU-based approach to accelerate the multiple-choice knapsack problem, which is a general version of the 0–1 knapsack problem. Apart from exploiting the parallelism offered by the GPUs, we also employ a variety of GPU-specific optimizations to further accelerate the running times of the knapsack problem. Moreover, our technique is scalable in the sense that even when running large instances of the multiple-choice knapsack problems, we can efficiently utilize the GPU compute resources and memory bandwidth to achieve significant speedups.01/2012;

Page 1

1

Code-size Minimization in Multiprocessor

Real-Time Systems

Sanjoy Baruah and Nathan Fisher

Department of Computer Science

The University of North Carolina at Chapel Hill

Abstract—Program code size is a critical factor in

determining the manufacturing cost of many embedded

systems, particularly those aimed at the extremely cost-

conscious consumer market. However, most prior theo-

retical research on partitioning algorithms for real-time

multiprocessor platforms has only focused on ensuring

that the cumulative computing requirements of the tasks

assigned to each processor does not exceed the processor’s

computing capacity. We consider the problem of task par-

titioning in multiprocessor platforms in order to minimize

the total code size, in application systems in which there

may be several different implementations of each task

available, with each implementation having different code

sizes and different computing requirements. We prove that

the general problem is intractable, and present polynomial-

time algorithms for solving (well-defined) special cases of

the general problem.

Index

scheduling; Minimal-memory partitioning; Multiple task

implementations.

Terms—Multiprocessorsystems;Partitioned

I. INTRODUCTION

As the functionality demanded of real-time embed-

ded systems has increased, it is becoming unreasonable

to expect to implement them upon uniprocessor plat-

forms [22]; hence, multiprocessor platforms are increas-

ingly used for implementing such systems. This fact

is particularly true for systems that are aimed at the

consumer market, where cost considerations rule out the

use of the most powerful (and expensive) processors.

Efficient system implementation on such multiprocessor

platforms may require the careful management of sev-

eral key resources, such as processor capacity, memory

capacity, communication bandwidth, etc.

Supported in part by the National Science Foundation (Grant Nos.

ITR-0082866, CCR-0204312, and CCR-0309825).

For many embedded applications, a major determinant

of system cost is the total amount of memory needed.

For such systems the program code size is a critical

factor in determining the manufacturing cost of the

system [5], [17], since reducing code-size results in

an implementation with less memory. One promising

code size reduction technique that has recently been

much explored is to use processor architectures that

support multiple instruction sets. Examples include the

ARM Thumb [3] and MIPS16 [19], each of which has

two instruction sets: a normal 32-bit instruction set and

a smaller 16-bit instruction set with a smaller set of

opcodes and access to fewer registers. During run-time,

16-bit instructions may be dynamically decompressed by

hardware into 32-bit equivalent ones before execution:

this approach reduces the program code size at the cost

of increased computation during run-time. Processors

supporting dual instruction sets typically allow programs

to contain a mix of normal mode and reduced-width

mode instructions, by providing a single instruction that

toggles between the two modes. This feature affords the

system designer the capability of considering a range

of different implementations of any particular process

or task, each of which may choose a different tradeoff

between code size and execution time by having a

different fraction of its code compressed.

In this paper, we address the following question:

Given a multiprocessor platform comprised of m such

processors, and a collection of n tasks each with up to

t different implementations, determine a partitioning of

the tasks among the processors such that the memory

required for storing the program code is minimized. We

focus upon shared-memory multiprocessors (SMP’s), in

which all the code is stored in the shared memory;

however, our techniques are easily adapted to handle dis-

tributed memory multiprocessors, in which each proces-

Page 2

2

sor has local memory and the code for a task assigned to

a processor is resident on the processor’s local memory,

as well.

Most prior theoretical research on partitioning algo-

rithms for real-time multiprocessor platforms has fo-

cused on ensuring that the cumulative computing re-

quirements of the tasks assigned to each processor do

not exceed the processor’s computing capacity [13], [11].

Our research can be considered to be a generalization of

this earlier work, in the sense that there is an additional

criterion to be optimized – namely, the total amount of

memory used to store the program code.

The remainder of this paper is organized as follows. In

Section II, we formally define the problem that we wish

to solve, prove that it is intractable, and briefly list related

research. In Section III, we describe how our problem

may be mapped on to an equivalent Integer Linear

Programming (ILP) problem. In Section IV, we briefly

review some properties of linear programs. In Section V,

we use these properties to derive an efficient approximate

algorithm for obtaining a mapping of tasks to processors.

We conclude in Section VI, with a summary of the

results presented here. and a brief discussion on ongoing

research into extensions to these results.

II. SYSTEM MODEL

In this paper, we consider the problem of mapping a

given collection of tasks upon a platform comprised of

multiple processors. We will assume that all processors

are identical, in the sense that they have exactly the

same computing capacity (and in the distributed memory

model, the same amount of local memory) available.

Each task may have up to t distinct implementations,

where t is some known constant. Each implementation

of a task is characterized by two parameters:

• its utilization, denoting the computing capacity that

is needed for executing it; and

• its code-size, denoting the amount of memory that

is needed for storing its program code.

We now describe how systems comprised of such tasks,

to be scheduled upon multiprocessor platforms com-

prised of identical processors, may be formally denoted.

Definition 1 (System Specification): Let m be a posi-

tive integer and c a positive real number, and let un×t

and sn×t denote two (n × t) matrices of non-negative

real numbers. Then Γ = (?un×t,sn×t,m,c?) denotes the

def

system consisting of n tasks that is to be implemented

on a platform comprised of identical m processors each

of computing capacity c. The j’th implementation of the

i’th task has utilization ui,j and code-size si,j respec-

tively. Without loss of generality, we will assume that

the implementations of each task are indexed according

to decreasing code-size (and correspondingly, increasing

utilization1); if there are k < t implementations for

a particular task i, we will set si,k+1,si,k+2,...,si,t

each equal to zero, and ui,k+1,ui,k+2,...,ui,teach equal

to a number greater than c (thereby implying that the

corresponding implementation will not “fit” on a pro-

cessor). We therefore have ui,1≤ ui,2≤ ··· ≤ ui,t, and

si,1≥ si,2≥ ··· ≥ si,t, for all i, 1 ≤ i ≤ n.

We illustrate our specification by an example.

Example 1: Consider ?un×t,sn×t,m,c? with n = 5,

t = 3, m = 2, and c = 1, and let un×tand sn×tbe as

presented in Figure 1.

Each row of the two matrices taken together specifies

one task. Let us, for instance, consider the fourth row.

There are two possible implementations of the corre-

sponding task: one that has code-size equal to 0.3 and

processor utilization equal to 0.05, and a second in which

the code-size is halved but the utilization is doubled.

(Note that it is a coincidence here that halving the code-

size doubles the utilization — the model does not require

that the code-size be linearly related to utilization. Thus

for example the third implementation of the first task has

code-size one-half that of the first implementation, while

its utilization is only 12

One further definition.

Definition 2: For any U ≥ 0, let ?un×t,sn×t,m,c?U

denotethe systemobtained

?un×t,sn×t,m,c?

implementations for which the utilization is > U. That is,

?un×t,sn×t,m,c?Uis obtained from ?un×t,sn×t,m,c?

by setting ui,jto be greater than c and si,jequal to zero,

for every (i,j) such that ui,j> U in ?un×t,sn×t,m,c?,

and keeping the remaining parameters unchanged.

3times as much.)

from

all

the

those

system

task bydeleting

Example 2: Consider the system ?u5×3,s5×3,2,1?

specified in Example 1. The system ?u5×3,s5×3,2,1?0.5

1Note that it makes no sense to consider two implementations

such that both the code-size and the utilization of one are smaller

than those of the other — from the perspective of our problem, the

implementation with the smaller parameters is superior to the other

implementation, which therefore needs no further consideration.

Page 3

3

un×t=

0.3

0.1

0.15

0.05

0.3

0.4

0.4

0.25

0.1

1.1

0.5

0.6

0.3

1.1

1.1

and sn×t=

0.4

0.5

0.4

0.3

0.6

0.25

0.4

0.25

0.15

0.0

0.2

0.3

0.2

0.0

0.0

Fig. 1.Task specifications for Example 1.

is obtained from the original system by “eliminating”

all task implementations in which the utilization exceeds

0.5; this results in

0.1

0.15

0.05

0.3

u5×3=

0.30.4

0.4

0.25

0.1

1.1

0.5

1.1

0.3

1.1

1.1

and

s5×3=

0.4

0.5

0.4

0.3

0.6

0.25

0.4

0.25

0.15

0.0

0.2

0

0.2

0.0

0.0

We are now ready to define our problem precisely.

Definition 3 (Code-size minimal task assignment.):

Given a system (?un×t,sn×t,m,c?), determine a choice

function θ : {1,...,n} → {1,...,t} and a processor

mapping function χ : {1,...,n} → {1,...,m} such

that the following m conditions are satisfied:

{all i | χ(i)=k}

and the following quantity is minimized:

for all k,1 ≤ k ≤ m,

?

ui,θ(i)≤ c

,

(1)

n

?

i=1

si,θ(i)

(2)

Intuitively, the choice function θ(i) designates which

of the available alternative implementations of task i is

chosen, and the processor mapping function χ(i) desig-

nates which processor this implementation goes on. The

m conditions (1) assert that the utilization bound of each

processor is respected, while Expression 2 represents

the total amount of memory that is needed to store the

chosen program code

Example 3: Let us return to the system considered

in Example 1. Observe that the numbers in the first

column of the utilization matrix (Figure 1) sum to 0.9;

hence, the collection of task implementations obtained

by taking the first (minimum-utilization) implementation

of each task is in fact uniprocessor feasible on a unit-

capacity processor. The total code-size for these five

implementations is the first column-sum of the s matrix,

and equals 2.2.

It can be shown through exhaustive enumeration that

the total code-size is minimized on the two unit-capacity

processors that are available for the following choice

function:

i

θ(i)

1

3

2

3

3

3

4

2

5

1

and the following processor mapping function:

i

χ(i)

1

1

2

2

3

2

4

2

5

1

In this partitioning, the total utilization on processor

1 is 0.5 + 0.3 = 0.8, while on processor 2 it is 0.6 +

0.3+0.1 = 1.0. The total code-size is 0.2+0.3+0.2+

0.15 + 0.6 = 1.45.

It is not difficult to see that the code-size minimal task

assignment problem is intractable; indeed, even severely

restricted versions are provably intractable:

Theorem 1: The code-size minimal task assignment

problem is intractable, even under either of the following

two restrictions: (i) there is only one processor (m = 1);

or (ii) each task has only one possible implementation

(t = 1).

Proof Sketch:

On uniprocessors, the problem can

be transformed to the multiple-choice knapsack prob-

lem [18], [10], [1], which is known to be NP-hard.

On multiprocessors but with each task having only

one implementation, the problem can be transformed to

the bin-packing problem [7], [6], which is known to be

Page 4

4

NP-hard in the strong sense.

Our results. In this paper, we derive a polynomial time

algorithm for obtaining code-size minimal task assign-

ments, that makes the following performance guarantee.

If

(?un×t,sn×t,m,c?) of cost C and some constant

U (U < c) such that (i) at most an amount (c − U)

of each processor is used, and (ii) no individual

task occupies more than U of the capacity of any

processor (i.e., if the j’th implementation of task i

is the one selected, then ui,j≤ U)

then our algorithm will produce an implementation

of (?un×t,sn×t,m,c?) of cost at most C.

thereexists animplementation of

However, it is possible that there is an implementation of

the system of cost C which does not satisfy the conditions

listed above; in that case, our algorithm may fail to find

an implementation of cost C — given that the problem

is NP-hard (Theorem 1), this is only to be expected.

Related research. When the code-size minimization

objective may be ignored, task partitioning on multi-

processors is essentially a bin-packing [7], [6] problem:

Each processor is a “bin” of capacity one, and each task

assigned to it consumes an amount of this capacity equal

to its utilization. This relationship between bin-packing

and task partitioning is explored in, e.g, [13], [11]. In ad-

dition to bin-packing based research, there is much prior

work on multiprocessor task scheduling and allocation

problems based upon heuristic approaches. Algorithms

have been proposed based on genetic algorithms [12],

constraint logic programming [20], [21], integer pro-

gramming [15], and other heuristic approaches [4]; while

it may be of some interest to determine whether such

approaches are applicable for the code-size minimal task

assignment problem, this is not within the scope of the

current paper.

The issue of obtaining code-size minimal implemen-

tations of real-time systems upon uniprocessor platforms

has been studied by Shin et al. [17], for a more general

periodic task model – one in which tasks may have

arbitrary initial arrival-times, and deadlines distinct from

their periods. Since the uniprocessor feasibility-analysis

problem for collections of such periodic tasks is known

to be NP-hard in the strong sense, several heuristics are

proposed in [17], and are evaluated via simulations.

ILP(?un×t,sn×t,m,c?)

Minimize

.

n

?

i=1

(xi,j,k× si,j)

(3)

subject to the following constraints, and the restriction

that the xi,j,kvariables take on integer values only:

?

?

all j,k

xi,j,k= 1(i = 1,2,...,n)(4a)

all i,j

(xi,j,k· ui,j) ≤ c

(k = 1,2,...,m) (4b)

(4)

Fig. 2. ILP representation of the code-size minimal task-assignment

problem.

III. AN ILP FORMULATION

In an Integer Linear Program (ILP), one is given a

set of variables, some or all of which are restricted

to take on integer values only, and a collection of

“constraints” that are expressed as linear inequalities

over the variables. The set of all points over which

all the constraints hold is called the feasible region for

the integer linear program. One may also be given an

“objective function,” also expressed as a linear inequality

of these variables, and the goal of finding the extremum

(maximum or minimum) value of the objective function

over the feasible region.

Consider any system (?un×t,sn×t,m,c?). For any

mapping of the n tasks on the m processors, let us define

(n×t×m) indicator variables xi,j,k, for i = 1,2,...,n;

j = 1,2,...,t; and and k = 1,2,...,m. Variable xi,j,k

is set equal to one if

θ(i) = j and χ(i) = k ;

i.e., if the j’th implementation of the i’th task is mapped

onto the k’th processor, and zero otherwise.

We can represent the code-size minimal task assign-

ment problem as the integer programming problem

of Figure 2, with the variables xi,j,krestricted to non-

negative integer values.

The n constraints corresponding to (4a) above assert

that each task be assigned some processor, while the

m constraints corresponding to (4b), that no processor’s

computing capacity is exceeded. It is not hard to see

that an assignment of non-negative integer values to the

Page 5

5

variables xi,j,ksatisfying these constraints is equivalent

to a feasible partitioning of the n tasks upon the m

processors. Thus, obtaining a solution to the ILP (4)

above is equivalent to determining whether a given sys-

tem is feasible. This is formally stated by the following

theorem:

Theorem 2: The Integer Linear Programming prob-

lem (4) has a solution if and only if the multiprocessor

system is feasible.

Theorem 2 above allows us to transform the code-

size minimal task assignment problem to an ILP prob-

lem. At first sight, this may seem to be of limited

significance, since ILP is also known to be intractable

(NP-complete in the strong sense [14]). However, some

recently-devised approximation techniques for solving

ILP problems, based upon the idea of LP relaxations

to ILP problems, may prove useful in obtaining approx-

imate solutions to the code-size minimal task assignment

problem – we explore these approximation techniques in

the remainder of this paper.

IV. A REVIEW OF SOME RESULTS ON LINEAR

PROGRAMMING

In this section, we briefly review some facts concern-

ing linear programming (LP) that will be used in later

sections. In a Linear Program (LP) over a given set of n

variables, as with ILPs, one is given a collection of con-

straints that are expressed as linear inequalities over these

n variables, and an objective function, also expressed as

a linear inequality of these variables. The region in n-

dimensional space over which all the constraints hold is

again called the feasible region for the linear program,

and the goal is to find the extremal value of the objective

function over the feasible region. A region is said to be

convex if, for any two points p1and p2in the region and

any scalar λ,0 ≤ λ ≤ 1, the point (λ·p1+(1−λ)·p2) is

also in the region. A vertex of a convex region is a point

p in the region such that there are no distinct points p1

and p2 in the region, and a scalar λ,0 < λ < 1, such

that [p ≡ λ · p1+ (1 − λ) · p2].

It is known that an LP can be solved in polynomial

time by the ellipsoid algorithm [9] or the interior point

algorithm [8]. (In addition, the exponential-time simplex

algorithm [2] has been shown to perform extremely well

“in practice,” and is often the algorithm of choice despite

its exponential worst-case behaviour.) We do not need

to understand the details of these algorithms: for our

purposes, it suffices to know that LP problems can be

efficiently solved (in polynomial time).

We now state without proof some basic facts concern-

ing such linear programming optimization problems.

Fact 1: The feasible region for a LP problem is

convex, and the objective function reaches its optimal

value at a vertex point of the feasible region.

An optimal solution to an LP problem that is a vertex

point of the feasible region is called a basic solution to

the LP problem.

Fact 2: Consider a linear program on n variables

x1,x2,...,xn, in which each variable is subject to the

constraint that it be at least 0 (these constraints are

called non-negativity constraints). Suppose that there are

a further m linear constraints. If m < n, then at most m

of the variables have non-zero values at each vertex of

the feasible region2(including the basic solution).

Note that Fact 1 above does not claim that all points in

the feasible region that correspond to optimal solutions

to an LP are vertex points; rather, the claim is that some

vertex point is guaranteed to be in the set of optimal so-

lutions. For LP problems with a unique optimal solution,

it is guaranteed that this unique solution is a vertex and

hence all (correct) LP solvers will return a basic solution.

For LP problems that have several solutions, however,

interior-point or ellipsoid algorithms do not guarantee to

find a vertex solution (although the simplex algorithm

does). There are efficient polynomial-time algorithms

(see, e.g, [16]) for obtaining a basic solution given any

non-vertex optimal solution to a LP problem – if the LP-

solver being used does not guarantee to return a basic

solution, then one of these algorithms may be used to

obtain a basic solution from the optimal solution returned

by the LP-solver.

V. AN APPROXIMATE ALGORITHM

In this section, we derive a polynomial-time algo-

rithm that obtains a code-size minimal task assign-

ment of a given system ?un×t,sn×t,m,c?, provided

?un×t,sn×t,m,c? satisfies certain conditions. Let U de-

note a positive real-number no larger than c. Recall

(from Definition 2) that ?un×t,sn×t,m,c?Udenotes the

restricted system obtained from ?un×t,sn×t,m,c? by

2The feasible region in n-dimensional space for this linear program

is the region over which all the n+m constraints (the non-negativity

constraints, plus the m additional ones) hold.