ArticlePDF Available

The Design and Application of a Retargetable Peephole Optimizer

Authors:

Abstract

Peephole optimizers improve object code by replacing certain sequences of instructions with better sequences. This paper describes PO, a peephole optimizer that uses a symbolic machine description to simulate pairs of adjacent instructions, replacing them, where possible, with an equivalent single instruction. As a result of this organization, PO is machine independent and can be described formally and concisely: when PO is finished, no instruction, and no pair of adjacent instructions, can be replaced with a cheaper single instruction that has the same effect. This thoroughness allows PO to relieve code generators of much case analysis; for example, they might produce only load/add-register sequences and rely on PO to, where possible, discard them in favor or add-memory, add-immediate, or increment instructions. Experiments indicate that naive code generators can give good code if used with PO.
The Design and Application
of a Retargetable Peephole Optimizer
JACK W. DAVlDSON and CHRISTOPHER W. FRASER
University of Arizona
Peephole optimizers improve object code by replacing certain sequences of instructions with better
sequences. This paper describes PO, a peephole optimizer that uses a symbolic machine description
to simulate pairs of adjacent instructions, replacing them, where possible, with an equivalent sing!e
instruction. As a result of this organization, PO is machine independent and can be described formally
and concisely: when PO is finished, no instruction, and no pair of adjacent instructions, can be
replaced with a cheaper single instruction that has the same effect. This thoroughness allows PO to
relieve code generators of much case analysis; for example, they might produce only load/add-register
sequences and rely on PO to, where possible, discard them in favor of add-memory, add-immediate,
or increment instructions. Experiments indicate that naive code generators can give good code if used
with PO.
Key Words and Phrases: code generation, optimization, peephole, portability
CR Categories: 4.12
1. INTRODUCTION
Of all optimizations, those applied to object code are among the least understood.
Ad hoc instruction sets complicate formal treatment and portability. However,
experience shows the value of object code optimization; even compilers with
thorough global optimization reduce code size by 15-40 percent with object code
optimization [12]. This is no surprise. To be machine independent, global opti-
mization usually precedes code generation; to be simple and fast, code generators
usually operate locally; so the code generator produces code fragments that can
be locally optimal but may be suboptimal when juxtaposed. For example, local
code for a source program conditional ends with a branch; so does local code for
the end of a loop. Consequently, a conditional at the end of a loop becomes a
branch to a branch. Changing the code generator to handle such situations (for
instance, to generate only one branch) complicates its case analysis combinato-
rially, since each combination of language features may admit some optimization
[6]. It is easier to simplify the code generator and to subsequently optimize object
code.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
A preliminary report on this research was presented at the 6th Annual ACM Symposium on Principles
of Programming Languages, San Antonio, Tex., January 1979.
This work was supported by the National Science Foundation under Grant MCS78-02545.
Authors' address: Department of Computer Science, University of Arizona, Tucson, AZ 85721.
© 1980 ACM 0164-0925/80/0400-0191 $00.75
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980, Pages 191-202.
f
192 J.W. Davidson and C.W. Fraser
Little has been published on object code optimization, and some early object
code optimizations [2, 8, 9] (e.g., constant folding, exponentiation via multiplica-
tion} are now performed at a higher level [1, 11]. Recent object code optimizers
[4, 12] delete unnecessary tests (a previous instruction may have incidentally set
the condition code), exploit special case instructions and exotic address calcula-
tions, collapse chains of branches, delete unreachable code, and simulate register
contents to replace, where possible, memory references with equivalent register
references. They typically detect and correct only a few machine-specific patterns.
PO is a retargetable peephole optimizer. Given an assembly language program
and a symbolic machine description, PO simulates pairs of adjacent instructions
and, where possible, replaces them with an equivalent single instruction. PO
makes one pass to determine the effect of each instruction, a second to collapse
pairs of effects, and a third to select the cheapest instruction for each resulting
effect.
There are several advantages to this organization. PO is easily retargetted by
changing machine descriptions; the PDP-10, PDP-11, and Cyber 175 have already
been accommodated. PO is not cluttered by ad hoc case analysis because it
combines all possible adjacent pairs, not just branch chains or constant compu-
tations or any other special cases. As a result, PO's effect can be described
formally and concisely: when PO is finished, no instruction, and no pair of
adjacent instructions, can be replaced with a cheaper single instruction that has
the same effect. Subsequent sections explain "adjacent" and "cheaper" and show
how this thoroughness may relieve code generators of much case analysis.
The two-instruction window catches inefficiencies at the boundaries between
fragments of locally generated code. It is not designed to catch others, so PO is
best used with a high-level, machine-independent global optimizer.
2. MACHINE DESCRIPTIONS
To simulate an instruction, PO must know its syntax and its effect. Effects are
represented as ISP register transfers [3]; for example,
R[3] ~-- R[3] + 1 increments register 3
M[c] *-- 0 clears memory location c
PC ~-- (NZ = 0 ~ 140 else PC) jumps to 140 if NZ is zero
PO assumes that PC names the program counter. No other names are assumed
to have specific meanings, so PO can accommodate single-accumulator and stack
machines as easily as general register machines.
A machine is uescribed by a grammar for syntax-directed translation between
assembly language and register transfers. Productions are displayed in three
columns: the nonterminal being defined (in italic lowercase letters), the assembler
syntax for that nonterminal (with terminal symbols in boldface uppercase letters),
and the corresponding register transfer syntax. For example, these productions
from a PDP-11 description
nonterminal assembler syntax pattern register transfer pattern
*s i i
*d i i
inst CLR d d ~-- 0; NZ ~-- 0 ? 0
inst ADD s, d d ~--d + s; NZ ~--d + s? 0
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
Retargetable Peephole Optimizer 193
state that the CLR instruction clears its destination operand, that the ADD
instruction adds its source to its destination, and that both set condition code
bits N and Z to indicate whether the result is negative, zero, or positive. The
asterisk preceding the definitions of d and s tells PO that all of their instances
must match identical substrings of the register transfer. PO assumes that the
program counter is automatically incremented, so this effect need not be made
explicit. The productions
nonterminal assembler syntax pattern register transfer pattern
i a a
i @a M[a]
state that a word operand a may be preceded by a "@" for an indirect reference.
Finally, the productions
nonterminal assembler syntax pattern register transfer pattern
a x M[x]
a #x x
a Rn R[n]
a x(Rn) M[R[n]+x]
define how the word operand a may appear in assembly code: the lone address x
stands for the named memory location M[x], #x stands for the literal value x, Rn
stands for the named register R[n], and x(Rn) stands for an indexed address
where address x is the base and register R[n] holds the offset. The primitive
nonterminals are x, which stands for a symbolic address, and n, which stands for
a register index. Appendix A contains additional descriptions for the PDP-11.
Details irrelevant to the object code may be omitted from the machine
description. For example, PO does not need to know how the condition code
represents comparisons, so the machine description does not say. Similarly,
instructions that PO has little or no chance of collapsing {e.g., HALT, block
moves, subroutine calls, and returns) may be omitted. Undescribed instructions
may appear in programs--PO will not disturb them.
The machine description may disguise awkward machine features. For example,
PDP-11 conditional branches can only reach nearby words; assemblers--and PO
machine descriptions based on them--disguise this fact by allowing conditional
branches to arbitrary targets and translating them into two-instruction sequences
when "short" branches cannot reach. Similarly, the PDP-11 hardware does not
offer immediate operands; instead it offers "autoincrement" addressing, which
references indirectly through an index register that is subsequently incremented.
Since the program counter is also an index register, assemblers--and PO machine
descriptions--offer immediate operands by generating the less obvious autoincre-
ment through the program counter [10].
Since PO knows target machines only through these patterns, it is retargetted
by describing a different instruction set. Its few machine dependencies are
assumptions built into its algorithms and machine description language. For
example, the simulator assumes that the machine uses a program counter and
that cells, once set, stay set: PO cannot optimize code that uses changing device
registers. Similarly, PO's machine descriptions cannot easily represent instruc-
tions with internal loops (e.g., block moves, which may appear in programs but
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
194 J.W. Davidson and C.W. Fraser
will not be collapsed). In general, such assumptions can be removed by extending
PO. As it stands, PO can handle most instructions on most machines.
3. DETERMINING EFFECTS
PO needs to know the effectmin register transfersmof each instruction. If PO is
built into a compiler, the code generator can emit register transfers equivalent to
the object code instructions that it would otherwise generate, and PO can proceed
directly with collapsing pairs of instructions. On the other hand, if PO is to accept
assembly code, it must first make a pass to determine the effect of each assembler
statement in isolation (so PO assumes that programs do not modify themselves).
Given an assembler statement, it seeks a matching assembler syntax pattern and
returns the corresponding register transfer pattern, with pattern variables eval-
uated. For example, the instruction
ADD #2, R3
matches the syntax pattern
ADD s, d
so PO substitutes 2 (the translation of #2) for s and R[3] (the translation of R3)
for d in the register transfer pattern
d (--d + s; NZ
(----d+s ? 0
and obtains
R[3] ~-- R[3] + 2;
NZ *-
R[3]+2 ? 0
Programs typically ignore some effects of some instructions. For example, a
chain of arithmetic instructions may set and reset condition codes without ever
testing them. PO can do a better job if such useless register transfers are removed
from an instruction's register transfer list. For example, the full effect of the
instruction above includes assignments to the N and Z bits. If the next instruction
changes N and Z without testing them, its useful effect is only
R[3] ~--R[3] + 2
If the previous instruction references R[3] indirectly, the useful effect may be
had by autoincrementing instead and removing the ADD instruction. The full
effect requires the ADD instruction, since autoincrementing does not set the
condition code. Consequently, when initially determining each instruction's effect,
PO ignores effects on such "dead" variables. To do so, the initial pass scans the
program backward and associates with each instruction both its useful effect and
a list of cells that are unused and therefore may be changed arbitrarily. Each
instruction's list is computed to be that of its lexical successor, plus the cells its
successor sets, minus the cells its successor examines. If the instruction branches,
its list is taken to be empty, since the dead variables depend on the destination
of the branch. Full dead variable elimination (considering control flow and
subscripting) [7] is an unnecessary expense, for this simpler analysis permits the
first pass over the code to eliminate most "extra" effects such as condition code
setting. As a bonus, code not subjected to dead variable elimination at a higher
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
Retargetable Peephole Optimizer 195
level enjoys a measure of it now: instructions with no effect are removed. If PO
is used with a code generator that produces register transfers instead of assembly
code, the code generator will not produce extra effects and can report when
temporaries become dead, so a pass for dead variable identification should not be
needed.
4. COLLAPSING PAIRS
Once PO knows the isolated effect of each instruction, it passes forward over the
program and considers the combined effect of lexicaUy adjacent instructions;
where possible, it replaces such pairs with a single instruction having the same
effect. PO learns the effect of a pair by combining their independent effects and
substituting the values assigned to variables in the first for instances of those
variables in the second. The effect of
SUB #2, R3
CLR @R3
is (ignoring dead variable elimination)
R[3] *-- R[3] - 2;
NZ ~--
R[3]-2 ? 0
M[R[3]]
~-- 0; NZ ~--
0 ? 0
which simplifies to
R[3]
*-- a[3] - 2; M[R[3] - 2] ~-- 0; NZ ~-- 0 ? 0
PO now seeks a single instruction with a register transfer pattern matching this
effect. It finds the autodecrement version of CLR
CLR -(R3)
A register transfer pattern matches if it performs all register transfers requested
and if the rest of its register transfers set harmless dead variables (e.g., the
condition code). After each replacement, PO backs up one instruction--to con-
sider the new adjacency between the new instruction and its predecessor--and
continues.
Pairs that start with a branch need special treatment. The condition on which
the branch depends must be inverted and added to the register transfers of the
second instruction before combining effects. For example, the two instructions
PC ~-- (NZ -- 0 ~ 11 else PC)
PC *-- 12
11:
combine to
PC *-- (NZ = 0 ~ 11 else PC); PC *-- (not NZ -- 0 ~ 12 else PC)
11:
A symbolic simplifier now improves awkward relationals and deletes redundant
PC assignments, yielding
PC ~-- (NZ ~ 0 ~ lz else PC)
11 :
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
196 J.W. Davidson and C.W. Fraser
for the example above. Unconditional branches depend on the constant condition
true; the symbolic simplifier deletes register transfers depending on its inverse,
removing unreachable code.
Labels prevent the consideration of some pairs. Combining pairs whose second
instruction is labeled changes, erroneously, the effect of programs that jump to
the label to include the effect of the first instruction. PO must ignore such pairs
and assume that all branches are to explicit labels. To improve its chances, PO
removes any labels it can. When it encounters a label, it looks for a reference to
it; if it finds none--possibly because optimizations like the one above have
removed them all--PO removes the label and tries combining the two instructions
that it separated. This technique enables PO to remove the last three branches
in the large example below.
When PO removes the last reference to a label that it has passed, it should
back up to reconsider the instructions the label separated: new optimizations are
possible after the label is removed. This reconsideration is needed only for labels
referenced following their definition. However, when optimizing the code gener-
ated locally from a program with "structured" control flow, loop and subroutine
heads are the only such labels, and peephole optimizers seldom remove these
labels. So this particular form of backup, though easily implemented and theo-
retically necessary, was discarded as ineffective.
PO collapses branch chains by treating a branch and its target as an extra pair.
If an instruction branches to l, PO concatenates the branch instruction with the
instruction at l, attempts optimization of this pair, and replaces the first branch
(leaving the instruction at 1 alone) if possible. For example, the PDP-10 sequence
JRST 11
ll:
AOJG 3, 12
has the effect
PC ~-- 11
*,°
11: R[3] ~-- R[3] + 1; PC ~-- (R[3]+l > 0 ~ 12
else
PC)
which combines to
R[3] *-- R[3] + 1; PC *-- (R[3]+l > 0 ~ 12
else
PC)
,.o
11: R[3] ~-- R[3] + 1; PC ~-- (R[3]+1 > 0 ~ 12
else
PC)
so PO replaces the first instruction with the second. (Note that the second
instruction may now be unreachable.) PO does not make the replacement if it
requires the introduction of a new label. For example, if the second instruction
had the effect
11: R[3]*--R[3] + 1
PO could replace the first instruction with
AOJG 3,11+1
ACM Transactions on Programming Languages and Systems, Vo]. 2, No. 2, April 1980.
Retargetable Peephole Optimizer 197
but it does not because, as shown above, introducing new labels (1,+1) prevents
the consideration of other pairs. PO combines only physically adjacent instruc-
tions and branch chains.
When this pass reaches the end of the program, PO makes a third and final
pass to translate the remaining register transfers back to assembly code. When
searching for an instruction that realizes a particular set of register transfers, PO
scans the instruction list in order, so cheaper instructions should be described
first. Occasionally two (or more) instructions are better than one; the general
solution to this problem is to add instruction timings to machine descriptions,
but it is less expensive and just as practical to describe the two-instruction
sequence as a macro instruction and place it before the less desirable single
instruction. This third pass could be absorbed into the second pass if the second
pass kept track of register transfers
and
the equivalent assembly code.
For purposes of comparison, Appendixes B-D show PO optimizing a 30-instruc-
tion program that has been used to illustrate FINAL, the PDP-11-dependent
object code optimizer of the BLISS-11 optimizing compiler [12]. PO yields 19
instructions; by simply combining adjacent instructions, it collects branch chains,
uses special-purpose addressing modes, combines jumps-over-jumps, and deletes
useless tests and unreachable code. FINAL yields 16 instructions, because it does
"cross-jumping," a reordering that can eliminate redundant code. Cross-jumping
may permit other optimizations but, by itself, does not make programs faster,
only smaller. Hence, it differs fundamentally from PO's optimizations; even a
wider window would not help. Cross-jumping could be added to PO, but the
larger need is for a space-optimizer that reduces code size through general
reorderings.
5. CODE GENERATION
PO can greatly reduce the number of cases that a code generator must consider
to produce quality code. Suppose that the early, largely machine-independent
compiler phases produce intermediate postfix code for a simple stack machine.
For example,
i~--i- 1
might be translated to
PUSH i
push the address i
INDIR
replace it with the word it addresses
PUSH 1
push the constant 1
SUB
subtract the 1 from i
PUSH i
push the address i
STORE
store the result in i
Though bulky, this code is easy to generate. Furthermore, it is easy to write
macros that expand it into code for target machine. For example, the macros
might rewrite the code above in PDP-11 assembly language, simulating the stack
in words a and b:
MOV #i,a
move address i to a
MOV @a,a
replace it with the word it addresses
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
198 J.W. Davidson and C.W. Fraser
Table I
Before After Host Hand
Function Machine Postfix PO PO compiler code
tprint
PDP-11 73 76 19 16 16
ctoi
PDP- 10 56 60 28 20 18
ctoi
Cyber 175 56 64 41 38 26
mmult PDP-11 81 95 41 40 22
tumult PDP-10 81 84 39 26 19
tumult Cyber 175 81 93 69 61 27
MOV #l,b
move 1 to b
SUB b,a
subtract b from a
MOV #i,b
move address i to b
MOV a,@b
move a to the word addressed by b
The macros need know only the most general instructions (e.g., subtract, not
decrement) and the most primitive addressing modes (enough to fetch addresses
and simulate a stack) because PO can introduce better ones. For example, PO
first reduces each of the three pairs above, yielding
MOV
i,a
move i to a
DEC a
decrement a
MOV a,i
move a to i
Then it uses a three-instruction window to reduce these to the optimal
DEC i
decrement i
PO needs the larger window when working with naive code generators because,
while many machines offer some one-instruction replacements for load/operate/
store sequences, few offer replacements for the load/operate and operate/store
subsequences; PO must look at all three to reduce them to one. Checking triples
slows PO but does not make it much more complex because it uses the pair-
handling machinery to combine triples. Fortunately, no special need has been
observed for a still larger window or a more complex replacement strategy (e.g.,
replacing triples with equivalent pairs).
Table I shows how this strategy has performed on larger examples. The
numbers give the sizes (in instructions) of the stack machine code, the target
machine code before and after optimization, and similar code produced by a more
conventional, machine-dependent optimizing compiler and by an assembly lan-
guage expert for three subroutines: tprint prints trees, ctoi converts strings to
integers, and mmult multiplies matrices.
In fact, PO can come even closer: in general, the host compilers did better, not
because of superior case analysis during code generation, but because they assign
crucial variables to registers and perform global optimizations. Because such
improvements are largely machine independent, they could be added to PO's
compiler without making it much harder to retarget.
6. DISCUSSION
PO is a five-page SNOBOL program that runs in 128K bytes on a PDP-11/70; the
program includes the simple one-page code generator outlined in the last section.
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
Retargetable Peephole Optimizer 199
It uses a two-page preprocessor to turn machine descriptions into SNOBOL
patterns. Machine descriptions are about two pages and can be written in an hour
or two by someone who knows the machine. This version trades speed for
simplicity; for example, it will look to see if a register transfer matches a
decrement pattern even if it already has failed to match a more general subtract
pattern. Such shortcuts slow PO: it typically processes only 1-10 instructions
each second, and this rate changes linearly with the number of patterns in the
machine description. The design of a production version is underway; for example,
it uses a table-driven pattern matcher that dismisses decrements when it dismisses
subtracts and is relatively insensitive to the size of the machine description.
Preliminary experiments indicate that this version will run fast enough for
everyday use, though conventional, hand-coded peephole optimizers will probably
remain faster.
PO's relative lack of context is also being addressed. Repeated application of
two- and three-instruction windows can increase the effective window size (wit-
ness the reduction of six instructions to one above), but sometimes more context
is needed. For example, PO cannot collapse an otherwise-reducible pair separated
by a third, uncombinable instruction; hand-coded peephole optimizers can. The
production version of PO may use simple data flow analysis to identify such
nonadjacent pairs that are likely candidates for combining.
Experience with PO also suggests reexamining the division of labor between
the global optimizer, register allocator, code generator, and object code optimizer.
PO eliminates (much) unreachable code; perhaps global optimizers should not
bother with this improvement. A recent code generator [5] matches intermediate
code against machine description patterns to guide local register allocation; since
PO does similar matching, perhaps it can allocate registers. Register transfers
resemble the quadruples that many global optimizers use to represent programs;
perhaps a machine-independent, global optimizer [7] can be adapted to accept a
machine description and use its more global view of object code to catch
inefficiencies missed by PO's narrow window.
APPENDIX A. PDP-11 DESCRIPTION
This is about 40 percent of the PDP-11 machine description; only primitive
nonterminals, a few instructions, and some other uninteresting details have been
omitted.
nonterminal assembler syntax pattern register transfer pattern
a Rn R[n]
a (Rn) + M[R[n] + + l
a -(Rn) M[- i R[n]]
a x(Rn) M[R[n]+x]
a x M[x]
a #x x
i a a
i @a M[a]
*d i i
*s i i
inst TST d
NZ~--d?. 0;
inst CMP s,d
NZ~-s?d;
inst CLR d d~--O;
NZ*--0?0;
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
200 J.W. Davidson and C.W. Fraser
nonterminal assembler syntax pattern register transfer pattern
inst MOV s,d d~-s; NZ~--s?0;
inst INC d d~-d+ 1; NZ~-d+ 1?0;
inst DEC d d*--d-l; NZ*-d-l?0;
inst ASL d d~--d*2; NZ*--d*2?0;
inst ASI~ d d~--d/2; NZ*-d/2?0;
inst ADD s,d d~-d+s; NZ~--d+s?0;
inst SUB s,d d*--d-s; NZe--d-s?0;
inst BR a PC~-a;
inst Brel a PC*--NZ rel 0 ~ a else PC;
APPENDIX B. TREE PRINTER 1
This is the PDP-11 assembly code produced by the first phases of the BLISS-11
compiler for a program that prints trees. In addition to the effects shown, each
nonbranch sets the condition code according the value it assigns; TSTs set the
condition code but do nothing else.
assembly code effect
(1) JSR R1, sav3 call sav3
(2)
MOV S+310,R3 R[3] *-M[S÷310]
(3) MOV 12(R5),R2 R[2] .- M[R[5]+12]
(4) ADD #177776,R3 R[3] *- R[3] - 2
(5) CLR @R3 M[R[3]] *- 0
(6) 15:IS: TST left(R2) NZ ~-- M[R[2]+left] ? 0
(7) BNE IT PC *- NZ ~ 0 ~ 1T else PC
(8)
BR Is PC *-- Is
(9) Iv : ADD #177776,R3 R[3] ~-- R[3] - 2
(10)
SOY R2,@R3 M[R[3]] *-
R[2]
(11) MOV left(R2),R2 R[2] *-- M[R[2]+left]
(12)
BR Is PC *- IS
(13) Is: MOV info(R2),R1 R[1] *-M[R[2]+info]
(14) JSR RT,print call print
(15)
19:
MOV right(R2),R2 R[2] *-- M[R[2]+right]
(16) TST R2 NZ .- R[2] ? 0
(17) BEQ 11o PC ~-- NZ ffi 0 ~ llo else PC
(18) BR 111 PC *- In
(19) 1~0: MOV @R3,R2 R[2] ~-M[R[3]]
(20) ADD #2,R3" R[3] ~-R[3] + 2
(21) TST R2 NZ *- R[2] ? 0
(22) BNE 112 PC *- NZ ~ 0 ~ 112 else PC
(23) BR lla PC ~- 113
(24) 112: MOV info(R2),R1 R[1] *- M[R[2]+info]
(25) JSR R7,print call print
(26) BR 114 PC *- 114
(27) lls: BR h PC *- 14
(28) 114: BR IS PC *- IS
(29) 111: BR IS PC *- IS
(30) 14 : RTS R7 return
1 With thanks to Elsevier Publishing.
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
Retargetable Peephole Optimizer 201
APPENDIX C. PO'S PAIRWlSE
OPTIMIZATIONS ON TREE PRINTER
In most cases, PO replaces the pair named with one equivalent instruction;
comments note the three reductions involving nonadjacent branch chain members
where the second instruction is retained.
pair result instruction explanation
(a) 4,5 CLR -(R3) use autodecrement
(b) 7,8 BEQ Is remove label 17
(c)
9,10 MOV R2,-(R3) use autodecrement
(d) 15,16 19: MOV right(R2),R2 remove TST
(e) 17,16 BNE 111 remove label 110
(f) (e),29 BNE 16 remove label Iii, retain 29
(g) 19,20 MOV (R3)+,R2 use autoincrement
(h) (g),21 MOV (R3)+,R2 remove TST
(i) 22,23 BEQ ll~ remove label 112
(j) (i),27 BEQ 14 remove label 1~3, retain 27
(k) 26,27 BR 114 27 unreachable without 113
(l) (k),28 BR 19 remove label ll4, retain 28
(m) (1),28 BR 19 28 unreachable without ll4
(n) (m),29 BR 19 29 unreachable without lu
APPENDIX D.
OPTIMIZED TREE PRINTER
Here is the tree printer after PO's optimizations. FINAL's cross-jumping opti-
mization changes the last branch to go to ls instead of 19 and eliminates the
second MOV/JSR sequence. This, in turn, allows it one last optimization: the
now-adjacent BEQ and BR can be combined into a BNE.
0
assembly code effect
(1) JSR Rl,sav3 call
sav3
(2) MOV S+310,R3 R[3] ~--M[S+310]
(3) MOV 12(R5),R2 R[2] ~-- M[R[5]+12]
(4) CLR -(R3) M[R[3]-2] ~-- 0; decrement R[3]
(6) 15:ls: TST left(R2) NZ ~-- M[R[2]+left] ? O
(7) BEQ ls PC *-- NZ ffi O ~ ls
else
PC
(9)
MOV R2,-(R3) M[R[3]-21 ~- R[2]; decrement R[3]
(11) MOV left(R2),R2 R[2] *-- M[R[2]+left]
(12) BR 16 PC
*-- 14
(13) 14: MOV info(R2),R1 R[1] *-M[R[2]+info]
(14) JSR R7,print call
print
(15) 19: MOV right(R2),R2 R[2] ~-- M[R[2]+right]
(17) BNE 15 PC *- NZ ~ 0 ~ 15
else
PC
(19) MOV (R3)+,R2 R[2] *-- M[R[3]]; increment R[3]
(22) BEQ 14 PC *- NZ ffi 0 ~ 14 e/se PC
(24) MOV info(R2),R1 RIll *-- M[R[2]+info]
(25) JSR R7,print call
print
(26) BR 19 PC ~-- lg
(30) 14 : RTS R7 return
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
202 J.W. Davidson and C.W. Fraser
REFERENCES
1. ALLEN, F.E., AND COCKE, J. A catalogue of optimizing transformations. In
Design and Optimi-
zation of Compilers,
no. 1-30, R. Rustin (Ed)., Prentice-Hall, Englewood Cliffs, N.J., 1972.
2. BAGWELL, J.T. Local optimizations. SIGPLAN Notices 5, 7 (July 1970), 52-66.
3. BELL, e.G., AND NEWELL, A.
Computer Structures: Readings and Examples.
McGraw-Hill,
New York, 1971.
4. FORTRAN-10 reference manual. Digital Equipment Corp., Maynard, Mass., 1974.
5. GLANVILLE, S., AND GRAHAM, S.L. A new method for compiler code generation. In Conf. Rec.
5th Annu. ACM Syrup. Principles of Programming Languages, 1978, pp. 231-240.
6. HARRISON, W. A new strategy for code generation--The general purpose optimizing compiler.
In Conf. Rec. 4th ACM Syrup. Principles of Programming Languages, 1977, pp. 29-37.
7. HECHT, M.S.
Flow Analysis of Computer Programs.
North-Holland, Amsterdam, 1977.
8. LowRY, E.S., AND MEDLOCK, C.W. Object code optimization.
Commun. ACM 12,
1 (Jan. 1969),
13-22.
9. MCKEEMAN, W.M. Peephole optimization.
Commun. ACM
8, 7 (July 1965), 443-444.
10. PDP-11 processor handbook. Digital Equipment Corp., Maynard, Mass., 1975.
11.
STANDISH, T.A., HARRIMAN,
D.C., KIBLER, D.F.,
AND NEIGHBORS,
J.M. The Irvine program
transformation catalogue. Dep. Information and Computer Science, Univ. California, Irvine, 1976.
12. WULF, W., JOHNSSON, R.K., WEINSTOCK, C.B., HOBBS, S.O., AND GESCHKE, C.M.
The Design
of an Optimizing Compiler.
American Elsevier, New York, 1975.
Received July 1979; revised December 1979; accepted January 1980
ACM Transactions on Programming Languages and Systems, Vol. 2, No. 2, April 1980.
... As first introduced in Chapter 2, peephole optimisation is a technique for locally im proving code sequences by substituting shorter or faster sequences in a small win dow, known as the "peephole" [40,133]. It is characteristic of peephole optimisation that each improvement may spawn opportunities for additional improvements, such as redundant instruction elimination or algebraic simplifications [2,41]. ...
... Peephole optimisers have found widespread use in most modern compiler toolchains, with the technique first identified by McKeeman in the 1960s [133]. There has been a rich history of developing and applying peepholing techniques [40,41,172], for ex ample using architectural descriptions [102], combining with register allocation [43] and even using superoptimising techniques [14]. There have also been declarative ap proaches to generating rules for peephole optimisation using a form of string pattern matching [167]. ...
... The main peepholing implementations [40,42,43,133,167] rely on significant code analysis and pattern matching, generating sequences from machine descriptions. With buildMultiple, we work from the machine description and the available instructions to generate all optimal sequences of length one upwards, for a given number of inputs, which are then used to generate sequences of length two, and so on. ...
... A RTL passou a ter alguma relevância quando apareceu como forma de representação intermédia do Peephole Optimizer (PO), desenvolvido por Christopher Fraser e Jack Davidson [DF80,DF84a,DF84b]. Sendo desde então uma referência, não tanto pelas suas características descritivas, uma vez que estas são muito semelhantes ao Three Address Code, mas mais pelo contexto em que foi utilizada. ...
... É dentro deste contexto que surge uma das soluções que está na base de alguns dos melhores compiladores actuais. Apareceu pela primeira vez no YC (Y compiler) [DF84a,DF84b], que teve por base o PO [DF80,GF88]. Trata-se como tal, de um compilador com uma fase de optimização extremamente desenvolvida, a qual tem um papel fundamental em todo o processo, que é minimizar o número de decisões posteriores a tomar pelas fases de selecção e alocação de registos. ...
... O primeiro é uma referência a todos os níveis de um compilador portável, quer ao nível do back-end, quer ao nível do front-end. Trabalha com base na representação intermédia referida no capítulo anterior, o Register Transfer Language, sendo como tal um dos descendentes do programa PO desenvolvido por Jack Davidson e Christopher Fraser [DF80]. ...
... The GNU Compilers Collection -GCC [Stallman, R., 1999] uses the first approach, based on tuples, where all operands are represented by pseudo registers that are later mapped into real register or memory addresses. This solution, which is designated by Register Transfer Language -RTL, had origin on the code representation used by the Peephole Optimizer -PO [Fraser et al., 1980]. It is also used on the Zephyr project [Appel et al., 1998], that integrates several tools, like Very Portable Optimizer [Fraser et al., 1980] and the New Jersey Machine Code Toolkit [Ramsey et al., 1995], working based on this type of code representation. ...
... This solution, which is designated by Register Transfer Language -RTL, had origin on the code representation used by the Peephole Optimizer -PO [Fraser et al., 1980]. It is also used on the Zephyr project [Appel et al., 1998], that integrates several tools, like Very Portable Optimizer [Fraser et al., 1980] and the New Jersey Machine Code Toolkit [Ramsey et al., 1995], working based on this type of code representation. But these solutions, even being used on successful projects like GCC and Zephyr, do not satisfy all our requirements. ...
Article
Full-text available
DOLPHIN is a framework conceived to support the development of modular compilers. This framework supplies a large set of components, like: front-ends for some programming languages, back-ends for different computational architectures (operational system + processor), and several code analysis and optimization routines. Using these components, the user can build compilers for several programming languages and/or computational architectures, or even retargetable compilers. All these components work over the same code representation, designated by DOLPHIN Internal code Representation (DIR). This paper aims at presenting DIR, that uses a set of instantiated objects from C++ classes to represent the code at the middle-level of the compilation process. The article presents the principles that guide the conception of this code representation, the advantages and disadvantages compared with the alternative solutions; and several examples illustrating the application of DIR. KEYWORDS Compilers, code representation.
... La figure 2.11 montre que la structure interne du compilateur GCC permet d'implémenter uniquement 6 transformations (au lieu de 9) pour compiler 3 langages différents vers 3 plateformes différentes. Cela est assuré (dans les premières versions de GCC) grâceà la forme intermédiaire RTL (Register Transfer Level) [55] qui divise l'architecture du compilateur GCC en deux parties : le front-end et le back-end (Figure 2.11). La partie frontale (front-end) contient les transformations des langages sources vers la forme RTL. ...
Article
Model-Based Development (MBD) provides an additional level of abstraction, the model, which allows dealing with the increasing complexity of systems. Models let engineers focus on the business aspect of the developed system and permits automatic treatments of these models with dedicated tools like for instance synthesis of system's application by automatic code generation. Embedded Systems are often constrained by their environment and/or the resources they own in terms of memory, energy consumption or performance requirements. Hence, an important problem to deal with in embedded system development is linked to the optimization of software part of those systems according to the resources provided by their platform. Although automatic code generation and the use of optimizing compilers bring some answers to application optimization issue, this thesis shows that optimization results may be enhanced by adding a new level of optimizations at the model level before the code generation step. Actually, during the code generation, an important part of the modeling language semantics which could be useful for optimization is lost, thus, making impossible some optimizations achievement. We propose in this thesis a novel MBD approach that compiles directly UML models. The biggest challenge for this approach -tackled in this thesis- is to propose a model compiler that is as efficient as code compiler. Our model compiler (GUML: the UML front end for GCC) performs optimizations that GCC is unable to perform resulting in a more compact assembly code. Two GCC optimizations are enhanced: the dead code elimination optimization and the block merging.
... The Register Transfer Language (RTL) is a historical intermediate representation, created for the Peephole Optimizer [FD80], that is aimed at being syntactically closed to the target language (assembly), but abstract enough to be platform independent. It looks like three-address code, using simple assembly-like operations, however, these operations manipulate registers. ...
... • Information about register-transfer semantics is enough to build code improvers in the style of PO (Davidson and Fraser 1980), vpo (Benitez and Davidson 1988), and gcc (Stallman 1992). These code improvers work by pattern matching, so they need not know what all of the register-transfer operators do. ...
... Such graphs, together with the ability to match individual instructions, may su ce to build code-editing tools like EEL (Larus and Schnarr 1995) or ATOM (Srivastava and Eustace 1994). Information about register-transfer semantics is enough to build code improvers in the style of PO (Davidson and Fraser 1980), vpo (Benitez and Davidson 1988), and gcc (Stallman 1992 To build an emulator like SPIM (Larus 1990) or a binary translator like FX!32 (Thompson 1996), one needs enough information about the operations in the register transfers to interpret the the e ect of each register transfer on each bit of the processor's state. ...
... For example, the popular open source LLVM compiler[82] uses 3AC written as pseudo-assembly instructions and the Java language uses Java bytecode as an intermediate form fed to the Java virtual machine. Register Transfer Language[48] is a linear form close to assembly language that has appeared in many compilers including GCC.These linear forms support modularity in compiler design, allowing a clean separation between phases. Some compilers may use high level languages, such as C, as IRs also.The linear IRs above do not explicitly show any dependence information. ...
Article
A thesis submitted, on July 23rd, 2011, in partial fullment of the requirements for the degree of Doctor of Philosophy (DPhil) in the School of Informatics at the University of Sussex.
... GCC only compiles a basic block at a time into RTL, a full representation of the source program never exists. This intermediate representation originally stems from the Very Portable Optimizer (VPO) in [DF80]. ...
Article
Although I have proven that the general problem is undecidable, I show how, for machines of practical interest, to generate the back end of a compiler. Unlike previous work on generating back ends, I generate the machinedependent components of the back end using only information that is independent of the compiler’s internal data structures and intermediate form. My techniques substantially reduce the burden of retargeting the compiler: although it is still necessary to master the target machine’s instruction set, it is not necessary to master the data structures and algorithms in the compiler’s back end. Instead, the machine-dependent knowledge is isolated in the declarative machine descriptions. The largest machine-dependent component in a back end is the instruction selector. Previous work has shown that it is difficult to generate a highquality instruction selector. But by adopting the compiler architecture developed by Davidson and Fraser (1984), I can generate a naïve instruction
Article
Full-text available
Methods of analyzing the control flow and data flow of programs during compilation are applied to transforming the program to improve object time efficiency. Dominance relationships, indicating which statements are necessarily executed before others, are used to do global common expression elimination and loop identification. Implementation of these and other optimizations in OS/360 FORTRAN H are described.
Article
There are important classes of programs which must be highly efficient on a particular computer, independent of how fast that computer may be; systems programs are one such class. In order to be able to write these programs in a higher-level language and accrue the benefits associated with the use of such languages, there must be compilers which will produce highly efficient representations of these programs. The paper describes the design and implementation of a highly optimizing compiler for the BLISS language. A notational scheme is described in terms of which an overview of the compiler is presented. The logical phases of the compiler are described in some detail.
A catalogue of optimizing transformations In Design and Optimi-zation of Compilers, no. 1-30
  • F E Cocke
ALLEN, F.E., AND COCKE, J. A catalogue of optimizing transformations. In Design and Optimi-zation of Compilers, no. 1-30, R. Rustin (Ed)., Prentice-Hall, Englewood Cliffs, N.J., 1972.