Code-size Minimization in Multiprocessor
Sanjoy Baruah and Nathan Fisher
Department of Computer Science
The University of North Carolina at Chapel Hill
Abstract—Program code size is a critical factor in
determining the manufacturing cost of many embedded
systems, particularly those aimed at the extremely cost-
conscious consumer market. However, most prior theo-
retical research on partitioning algorithms for real-time
multiprocessor platforms has only focused on ensuring
that the cumulative computing requirements of the tasks
assigned to each processor does not exceed the processor’s
computing capacity. We consider the problem of task par-
titioning in multiprocessor platforms in order to minimize
the total code size, in application systems in which there
may be several different implementations of each task
available, with each implementation having different code
sizes and different computing requirements. We prove that
the general problem is intractable, and present polynomial-
time algorithms for solving (well-defined) special cases of
the general problem.
scheduling; Minimal-memory partitioning; Multiple task
As the functionality demanded of real-time embed-
ded systems has increased, it is becoming unreasonable
to expect to implement them upon uniprocessor plat-
forms ; hence, multiprocessor platforms are increas-
ingly used for implementing such systems. This fact
is particularly true for systems that are aimed at the
consumer market, where cost considerations rule out the
use of the most powerful (and expensive) processors.
Efficient system implementation on such multiprocessor
platforms may require the careful management of sev-
eral key resources, such as processor capacity, memory
capacity, communication bandwidth, etc.
Supported in part by the National Science Foundation (Grant Nos.
ITR-0082866, CCR-0204312, and CCR-0309825).
For many embedded applications, a major determinant
of system cost is the total amount of memory needed.
For such systems the program code size is a critical
factor in determining the manufacturing cost of the
system , , since reducing code-size results in
an implementation with less memory. One promising
code size reduction technique that has recently been
much explored is to use processor architectures that
support multiple instruction sets. Examples include the
ARM Thumb  and MIPS16 , each of which has
two instruction sets: a normal 32-bit instruction set and
a smaller 16-bit instruction set with a smaller set of
opcodes and access to fewer registers. During run-time,
16-bit instructions may be dynamically decompressed by
hardware into 32-bit equivalent ones before execution:
this approach reduces the program code size at the cost
of increased computation during run-time. Processors
supporting dual instruction sets typically allow programs
to contain a mix of normal mode and reduced-width
mode instructions, by providing a single instruction that
toggles between the two modes. This feature affords the
system designer the capability of considering a range
of different implementations of any particular process
or task, each of which may choose a different tradeoff
between code size and execution time by having a
different fraction of its code compressed.
In this paper, we address the following question:
Given a multiprocessor platform comprised of m such
processors, and a collection of n tasks each with up to
t different implementations, determine a partitioning of
the tasks among the processors such that the memory
required for storing the program code is minimized. We
focus upon shared-memory multiprocessors (SMP’s), in
which all the code is stored in the shared memory;
however, our techniques are easily adapted to handle dis-
tributed memory multiprocessors, in which each proces-
there may be several different implementations of each
task available, with each implementation having different
code sizes and different computing requirements. We
have formalized this problem as the code-size minimal
task assignment problem, have shown that this problem
is intractable even under severe simplifying assumptions,
and have derived efficient approximate algorithms for
The results presented in this paper can be extended
to task assignment algorithms for memory-constrained
multiprocessor systems with multiple task implementa-
tions. In a memory-constrained systems, the amount of
available memory capacity (either distributed or shared)
for program code is known a priori. The task assignment
algorithm presented in Figure 4 can easily be extended
to a memory-constrained system with shared memory
by running the algorithm and checking that memory
requirement, C, does not exceed the memory capacity.
However, the algorithm will require modification for task
assignment in distributed memory systems with memory
constraints. For the time being, we leave this problem
 ARMSTRONG, R. D., KUNG, D. S., SINHA, P., AND ZOLT-
NERS, A. A. A computational study of a multiple-choice
ACM Trans. Math. Softw. 9, 2 (1983),
 DANTZIG, G. B. Linear Programming and Extensions. Prince-
ton University Press, 1963.
 GOUDGE, L., AND SEGARS, S. THUMB: Reducing the cost
of 32-bit RISC performance in portable and consumer applica-
tions. In Proceedings of COMPCON (1996).
 GRANDPIERRE, T., LAVARENNE, C., AND SOREL, Y. Rapid
prototyping for real-time embedded heterogeneous multipro-
cessors. In International Workshop on Hardware/Software Co-
Design (CODES) (Rome, Italy, 1999), ACM Press.
 HALAMBI, A., SHRIVASTAVA, A., BISWAS, P., DUTT, N., AND
NICOLAU, A. An efficient compiler technique for code size
reduction using reduced bit-width ISAs.
DATE: Design, Automation and Test in Europe (2002), pp. 402–
 JOHNSON, D.Fast algorithms for bin packing.
Computer and Systems Science 8, 3 (1974), 272–314.
 JOHNSON, D. S. Near-optimal Bin Packing Algorithms. PhD
thesis, Department of Mathematics, Massachusetts Institute of
 KARMAKAR, N. A new polynomial-time algorithm for linear
programming. Combinatorica 4 (1984), 373–395.
 KHACHIYAN, L. A polynomial algorithm in linear program-
ming. Dokklady Akademiia Nauk SSSR 244 (1979), 1093–1096.
In Proceedings of
 KUNG, D. S. The Multiple Choice Knapsack Problem: Algo-
rithms and Applications. PhD thesis, The University of Texas
at Austin, 1982.
 LOPEZ, J. M., GARCIA, M., DIAZ, J. L., AND GARCIA, D. F.
Worst-case utilization bound for EDF scheduling in real-time
multiprocessor systems. In Proceedings of the EuroMicro
Conference on Real-Time Systems (Stockholm, Sweden, June
2000), IEEE Computer Society Press, pp. 25–34.
 MADSEN, J., AND BJORN-JORGENSEN, P. Embedded system
synthesis under memory constraints.
shop on Hardware/Software Co-Design (CODES) (Rome, Italy,
1999), ACM Press.
 OH, D.-I., AND BAKER, T. P.
processor rate monotone scheduling with static processor as-
Real-Time Systems: The International Journal of
Time-Critical Computing 15 (1998), 183–192.
 PAPADIMITRIOU, C. H. On the complexity of integer program-
ming. Journal of the ACM 28, 4 (1981), 765–768.
 PRAKASH, S., AND PARKER, A. C. Synthesis of application-
specific multiprocessor systems including memory components.
Journal of VLSI Signal Processing 8 (1994), 97–116.
 SCHRIJVER, A. Theory of Linear and Integer Programming.
John Wiley and Sons, 1986.
 SHIN, I., LEE, I., AND MIN, S. L. Embedded system design
framework for minimizing code size and guaranteeing real-time
requirements. In Proceedings of the IEEE Real-Time Systems
Symposium (Austin, TX, December 2002), IEEE Computer
Society Press, pp. 201–211.
 SINHA, P., AND ZOLTNERS, A. A.
knapsack problem. Operations Research 27 (1979), 503–515.
 SWEETMAN, D.
See MIPS Run.
Francisco, CA, 1999.
 SZYMANEK, R. W., AND KUCHCINSKI, K.
algorithm for memory-aware task assignment and scheduling.
In International Workshop on Hardware/Software Co-Design
(CODES) (Copenhagen, Denmark, 2001), ACM Press.
 SZYMANEK, R. W., AND KUCHCINSKI, K. Partial task assign-
ment of task graphs under heterogeneous resource constraints.
In International ACM/ IEEE Design Automation Conference
(DAC) (Anaheim, CA, 2003), ACM Press, pp. 244–249.
 WOLFE, W. Computers as Components: Principles of Embed-
ded Computing Systems Design. Morgan Kaufmann Publishers,
In International Work-
Utilization bounds for N-
The multiple choice
Morgan Kaufman, San