Content uploaded by Dominik Klumpp

Author content

All content in this area was uploaded by Dominik Klumpp on Sep 09, 2019

Content may be subject to copyright.

Motivation

Problem

Assembler Code

0000: cmp r0 , #0

0004: beq 00 1 0

0008: add r0 , r0 , #1

0 00 c : b 0 00 0

0010: b 0 0 1 0

Challenges

unstructured

instead: jumps (goto)

particularly: jumps to

computed addresses

Control Flow Graph

0000

0004

0000: cmp r0,#0

0010 0010: b 0010 0008

000C

0008: add r0,r0,#1

0004: z / b 0010 0004: ¬z / nop 000C: b 0000

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 2 / 10

Motivation

Problem

Assembler Code

0000: cmp r0 , #0

0004: beq 00 1 0

0008: add r0 , r0 , #1

0 00 c : b 0 00 0

0010: b 0 0 1 0

Challenges

unstructured

instead: jumps (goto)

particularly: jumps to

computed addresses

Control Flow Graph

0000

0004

0000: cmp r0,#0

0010 0010: b 0010 0008

000C

0008: add r0,r0,#1

0004: z / b 0010 0004: ¬z / nop 000C: b 0000

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 2 / 10

Motivation

Problem

Assembler Code

0000: cmp r0 , #0

0004: beq 00 1 0

0008: add r0 , r0 , #1

0 00 c : b 0 00 0

0010: b 0 0 1 0

Challenges

unstructured

instead: jumps (goto)

particularly: jumps to

computed addresses

Control Flow Graph

0000

0004

0000: cmp r0,#0

0010 0010: b 0010 0008

000C

0008: add r0,r0,#1

0004: z / b 0010 0004: ¬z / nop 000C: b 0000

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 2 / 10

Motivation

Problem

Assembler Code

0000: cmp r0 , #0

0004: beq 00 1 0

0008: add r0 , r0 , #1

0 00 c : b 0 00 0

0010: b 0 0 1 0

Challenges

unstructured

instead: jumps (goto)

particularly: jumps to

computed addresses

Control Flow Graph

0000

0004

0000: cmp r0,#0

0010 0010: b 0010 0008

000C

0008: add r0,r0,#1

0004: z / b 0010 0004: ¬z / nop 000C: b 0000

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 2 / 10

Motivation

Problem

Assembler Code

0000: bl 0020

0004: bl 0020

0008: b 0 0 0 8

0020: bx lr

Solution?

Simulate (one) path to

location 0020 and read value

of lr.

Control Flow Graph

0000

0020

0000: bl 0020

0008

0020: bx lr

0004

0020: bx lr

0008: b 0008

0004: bl 0020

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 10

Motivation

Problem

Assembler Code

0000: bl 0020

0004: bl 0020

0008: b 0 0 0 8

0020: bx lr

Imprecise!

Paths 0000 - 0020 - 0008

and 0000 - 0020 - 0004 -

0020 - 0004 do not reﬂect

actual control ﬂow!

Control Flow Graph

0000

0020

0000: bl 0020

0008

0020: bx lr

0004

0020: bx lr

0008: b 0008

0004: bl 0020

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 10

Motivation

Motivation

Why Control Flow Graphs?

Veriﬁcation

Data Flow & other static analyses

Standard analyses require control ﬂow graph.

Why Assembler Code?

WCET (Scheduling)

Security

Unveriﬁed compilers

Why Precision?

Hypothesis: Precise CFG ÑFast (and possibly more precise) analyses

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 4 / 10

Motivation

Motivation

Why Control Flow Graphs?

Veriﬁcation

Data Flow & other static analyses

Standard analyses require control ﬂow graph.

Why Assembler Code?

WCET (Scheduling)

Security

Unveriﬁed compilers

Why Precision?

Hypothesis: Precise CFG ÑFast (and possibly more precise) analyses

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 4 / 10

Motivation

Motivation

Why Control Flow Graphs?

Veriﬁcation

Data Flow & other static analyses

Standard analyses require control ﬂow graph.

Why Assembler Code?

WCET (Scheduling)

Security

Unveriﬁed compilers

Why Precision?

Hypothesis: Precise CFG ÑFast (and possibly more precise) analyses

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 4 / 10

Motivation

Motivation

Why Control Flow Graphs?

Veriﬁcation

Data Flow & other static analyses

Standard analyses require control ﬂow graph.

Why Assembler Code?

WCET (Scheduling)

Security

Unveriﬁed compilers

Why Precision?

Hypothesis: Precise CFG ÑFast (and possibly more precise) analyses

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 4 / 10

Goals

Goals

Control Flow Graphs should. . .

soundness . . . overapproximate actual program behaviour

i.e. contain all feasible program traces

precision . . . without being too imprecise

i.e. without having control ﬂow errors

Intuition

Atrace (sequence of statements) that does not have a control ﬂow error

can be infeasible.

But: The reason is always a violated data dependency, not the control ﬂow.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 5 / 10

Goals

Goals

Control Flow Graphs should. . .

soundness . . . overapproximate actual program behaviour

i.e. contain all feasible program traces

precision . . . without being too imprecise

i.e. without having control ﬂow errors

Intuition

Atrace (sequence of statements) that does not have a control ﬂow error

can be infeasible.

But: The reason is always a violated data dependency, not the control ﬂow.

Data Flow Error

Leaving a loop too early (condition not violated)

Contradicting if-conditions

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 5 / 10

Goals

Goals

Control Flow Graphs should. . .

soundness . . . overapproximate actual program behaviour

i.e. contain all feasible program traces

precision . . . without being too imprecise

i.e. without having control ﬂow errors

Intuition

Atrace (sequence of statements) that does not have a control ﬂow error

can be infeasible.

But: The reason is always a violated data dependency, not the control ﬂow.

Control Flow Error

return to incorrect call site

switch jumps to location without case/default label

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 5 / 10

Approach

Control Flow Reconstruction

Reconstruction:

assembly

program

p

initialize all nodes

resolved? CFG C

resolve v

RÐRbRv

expand & reﬁne C

CFG C

resolver R

no,

vunresolved

resolver Rv

yes

Analysis of a single node (based on trace abstraction reﬁnement1):

language

Lv

traces leading to v

+ instruction at v

RvÐRHLvĎLpRvq?resolver Rv

compute

locations

SMT2

compute Rpτ

RvÐRvbRpτ

SMT

no,

pτPLvzLpRvq

yes

1Reﬁnement of Trace Abstraction, Heizmann et al. 2009

2Boolector 2.0 system description, Niemetz et al. 2015

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 6 / 10

Approach

Control Flow Reconstruction

Reconstruction:

assembly

program

p

initialize all nodes

resolved? CFG C

resolve v

RÐRbRv

expand & reﬁne C

CFG C

resolver R

no,

vunresolved

resolver Rv

yes

Analysis of a single node (based on trace abstraction reﬁnement1):

language

Lv

traces leading to v

+ instruction at v

RvÐRHLvĎLpRvq?resolver Rv

compute

locations

SMT2

compute Rpτ

RvÐRvbRpτ

SMT

no,

pτPLvzLpRvq

yes

1Reﬁnement of Trace Abstraction, Heizmann et al. 2009

2Boolector 2.0 system description, Niemetz et al. 2015

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 6 / 10

Results

Theoretical Guarantees

Theorem: Soundness

If the algorithm terminates for a binary program pand returns a CFG C,

then Cover-approximates the behaviour of p.

Theorem / Corollary: Precision

If the algorithm terminates for a binary program pand returns C, and the

computed resolver is precise (resp. maps each state to at most one

location), then Chas no control ﬂow errors.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 7 / 10

Results

Theoretical Guarantees

Theorem: Soundness

If the algorithm terminates for a binary program pand returns a CFG C,

then Cover-approximates the behaviour of p.

Theorem / Corollary: Precision

If the algorithm terminates for a binary program pand returns C, and the

computed resolver is precise (resp. maps each state to at most one

location), then Chas no control ﬂow errors.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 7 / 10

Results

Theoretical Guarantees

Theorem: Soundness

If the algorithm terminates for a binary program pand returns a CFG C,

then Cover-approximates the behaviour of p.

Theorem / Corollary: Precision

If the algorithm terminates for a binary program pand returns C, and the

computed resolver is precise (resp. maps each state to at most one

location), then Chas no control ﬂow errors.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 7 / 10

Results

Empirical Evaluation

Implementation (7k LOC Scala) for ARM v8 32bit Assembler

Positive Results

Evaluation on custom and standard benchmarks3

Algorithm can analyse (small) realistic programs

Algorithm produces precise control ﬂow graphs

Future Work

Scalability is an open problem

e.g. more eﬃcient SMT usage, heuristics to resolve nondeterminism

Termination not guaranteed (in particular for recursive programs)

Evaluation: Which advantage has precision in practice?

3The M¨

alardalen WCET Benchmarks: Past, Present and Future, Gustafsson et al. 2010

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 8 / 10

Results

Empirical Evaluation

Implementation (7k LOC Scala) for ARM v8 32bit Assembler

Positive Results

Evaluation on custom and standard benchmarks3

Algorithm can analyse (small) realistic programs

Algorithm produces precise control ﬂow graphs

Future Work

Scalability is an open problem

e.g. more eﬃcient SMT usage, heuristics to resolve nondeterminism

Termination not guaranteed (in particular for recursive programs)

Evaluation: Which advantage has precision in practice?

3The M¨

alardalen WCET Benchmarks: Past, Present and Future, Gustafsson et al. 2010

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 8 / 10

Results

Empirical Evaluation

Implementation (7k LOC Scala) for ARM v8 32bit Assembler

Positive Results

Evaluation on custom and standard benchmarks3

Algorithm can analyse (small) realistic programs

Algorithm produces precise control ﬂow graphs

Future Work

Scalability is an open problem

e.g. more eﬃcient SMT usage, heuristics to resolve nondeterminism

Termination not guaranteed (in particular for recursive programs)

Evaluation: Which advantage has precision in practice?

3The M¨

alardalen WCET Benchmarks: Past, Present and Future, Gustafsson et al. 2010

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 8 / 10

Results

1004C

10050

mov r3,#2

10068

1006C

str r3,[sp,0004]

10090

10094

add sp,sp,#8

10064

mov r3,#8

10054

10088

b 10088<main+0x74>

10020

10024

cmp r3,#4

1008C

ldr r3,[sp,0004]

1005C

10060

str r3,[sp,0004]

1000C

bx lr

(or (not c) z) / ldr pc,[pc,r3,lsl2] (or (not c) z) / ldr pc,[pc,r3,lsl2]

10028

(not (or (not c) z)) / nop

10058

(or (not c) z) / ldr pc,[pc,r3,lsl2]

10070

(or (not c) z) / ldr pc,[pc,r3,lsl2]

10040

(or (not c) z) / ldr pc,[pc,r3,lsl2]

1007C

b 1007C<main+0x68>

mov r0,r3

mov r3,#4

b 10088<main+0x74>

10080

10084

str r3,[sp,0004]

10018

1001C

ldr r3,[pc,0078]

10014

sub sp,sp,#8

10000

10004

nop

ldr r3,[r3]

10074

mov r3,#16

nopb 10088<main+0x74>

str r3,[sp,0004]

10008

ldr sp,[pc,0004]

10078

b 10088<main+0x74>

10044

mov r3,#1

mvn r3,#0

1000C

b 1000C<_Reset+0xc>

10048

str r3,[sp,0004]str r3,[sp,0004]

b 1000C<_Reset+0xc>

bl 10014<main>

b 10088<main+0x74>

10000

100E0

100E4

cmp r2,r3

10030

10034

str r0,[sp,0004]

100E8

100EC

ldr r3,[sp,0014]

100CC

100D0

ldr r3,[sp,0008]

1001C

1002C

bl 1002C<binary_search>

10070

10074

ldr r3,[sp,0004]

100A8

100AC

ldr r3,[sp,0008]

10084

10088

str r3,[sp,0010]

10064

10068

ldr r2,[pc,008C]

1000C

1000C

b 1000C<_Reset+0xc>

100F0

mov r0,r3

10054

10058

ldr r3,[sp,0010]

100B8

(or z (distinct n v)) / b 100CC<binary_search+0xa0>

100BC

(not (or z (distinct n v))) / nop

10004 10008

ldr sp,[pc,0004]

100C8

100D8

b 100D8<binary_search+0xac>

1008C

ldr r2,[pc,0068]

10078

1007C

(not (not z)) / nop

100A4

(not z) / b 100A4<binary_search+0x78>

1005C

10060

mov r3,r3,asr1

100F4

add sp,sp,#24

100C4

str r3,[sp,0010]

10014 10018

stmdb sp!,{r4,lr}

1009C

100A0

str r3,[sp,0014]

100B4

cmp r2,r3

10094

10098

add r3,r2,r3

bl 10014<main>

10020

10024

mov r3,#0

100C0

sub r3,r3,#1

10028

ldm sp!,{r4,pc}

sub sp,sp,#24

100D4

add r3,r3,#1

bx lr

10050

ldr r2,[sp,000C]

b 1000C<_Reset+0xc>

add r3,r2,r3

ldr r3,[r3,0004]

10080

ldr r3,[sp,000C]

mov r0,r3

1006C

ldr r3,[sp,0008]

100DC

ldr r3,[sp,0010]

10090

mov r3,r3,lsl3

(not (or z (distinct n v))) / nop(or z (distinct n v)) / b 10050<binary_search+0x24>

100B0

ldr r3,[sp,0004]

mov r0,#8nop

cmp r2,r3

10048

1004C

str r3,[sp,0014]

str r3,[sp,0008]

ldr r2,[r2,r3,lsl3]

sub r3,r3,#1

10038

mov r3,#0

b 100D8<binary_search+0xac>

10044

mvn r3,#0

1003C

10040

mov r3,#14

ldr r3,[sp,0008]

b 100D8<binary_search+0xac>

ldr r2,[pc,004C]

ldr r2,[sp,000C]

ldr r3,[sp,0008]

str r3,[sp,000C]

str r3,[sp,000C]

ldr r2,[r2,r3,lsl3]

str r3,[sp,0010]

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 9 / 10

Backup Slides

Quality Criteria: Formalisation (sketched)

Deﬁnition

Trace: Sequence pτ“ pl1:ι1, . . . , ln:ιnqof addresses liwith corresponding

instructions ιi.

p

τis feasible (in program p) iﬀ there exists a sequence ps1, . . . , sn`1qwith

s1initial (in p) and sippcq “ liand si`1“ vιiw psiq‰Kfor iP t 1, . . . , nu.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 1 / 3

Backup Slides

Quality Criteria: Formalisation (sketched)

Deﬁnition

Trace: Sequence pτ“ pl1:ι1, . . . , ln:ιnqof addresses liwith corresponding

instructions ιi.

p

τis feasible (in program p) iﬀ there exists a sequence ps1, . . . , sn`1qwith

s1initial (in p) and sippcq “ liand si`1“ vιiw psiq‰Kfor iP t 1, . . . , nu.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 1 / 3

Backup Slides

Quality Criteria: Formalisation (sketched)

Deﬁnition

Trace: Sequence pτ“ pl1:ι1, . . . , ln:ιnqof addresses liwith corresponding

instructions ιi.

p

τis feasible (in program p) iﬀ there exists a sequence ps1, . . . , sn`1qwith

s1initial (in p) and sippcq “ liand si`1“ vιiw psiq‰Kfor iP t 1, . . . , nu.

Deﬁnition

A trace pτ“ pl1:ι1, . . . , ln:ιnqhas a control ﬂow error

iﬀ

it has a preﬁx pρ“ pl1:ι1, . . . , lk:ιkq(for kP t 0, . . . , n´1u), such that ρ

is feasible, but cannot reach lk`1.

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 1 / 3

Backup Slides

Analysis of a Single Trace

Trace

0000: bl 0020

0020: cmp r 0 , #0

0024: beq 00 3 0

0030: bx lr

Formulae

pc0“0000 lr1“pc0`4pc1“0020

pc1“0020 Z1“ pr00“0qpc2“pc1`4

pc2“0024 Z1pc3“0030

pc3“0030 lr1”0 mod 4 pc4“lr1

Solutions for pc4:t0004 u

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 3

Backup Slides

Analysis of a Single Trace

Trace

0000: bl 0020

0020: cmp r 0 , #0

0024: beq 00 3 0

0030: bx lr

Formulae

pc0“0000 lr1“pc0`4pc1“0020

pc1“0020 Z1“ pr00“0qpc2“pc1`4

pc2“0024 Z1pc3“0030

pc3“0030 lr1”0 mod 4 pc4“lr1

Solutions for pc4:t0004 u

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 3

Backup Slides

Analysis of a Single Trace

Trace

{true}

0000: bl 0020

{lr “0004}

0020: cmp r 0 , #0

{lr “0004}

0024: beq 00 3 0

{lr “0004}

0030: bx lr

{pc “0004}

Formulae

pc0“0000 lr1“pc0`4pc1“0020

pc1“0020 Z1“ pr00“0qpc2“pc1`4

pc2“0024 Z1pc3“0030

pc3“0030 lr1”0 mod 4 pc4“lr1

Solutions for pc4:t0004 u

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 3

Backup Slides

Analysis of a Single Trace

Trace

{true}

0000: bl 0020

{lr “0004}

0020: cmp r 0 , #0

{lr “0004}

0024: beq 00 3 0

{lr “0004}

0030: bx lr

{pc “0004}

Formulae

pc0“0000 lr1“pc0`4pc1“0020

pc1“0020 Z1“ pr00“0qpc2“pc1`4

pc2“0024 Z1pc3“0030

pc3“0030 lr1”0 mod 4 pc4“lr1

Solutions for pc4:t0004 u

Resolver

true

lr “0004

pc “0004 t0004 u

0000: bl 0020

0020: cmp r0, #0,

0024: beq 0030

0030: bx lr

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 3

Backup Slides

Analysis of a Single Trace

Trace

{true}

0000: bl 0020

{lr “0004}

0020: cmp r 0 , #0

{lr “0004}

0024: beq 00 3 0

{lr “0004}

0030: bx lr

{pc “0004}

Formulae

pc0“0000 lr1“pc0`4pc1“0020

pc1“0020 Z1“ pr00“0qpc2“pc1`4

pc2“0024 Z1pc3“0030

pc3“0030 lr1”0 mod 4 pc4“lr1

Solutions for pc4:t0004 u

Resolver

true

lr “0004

pc “0004 t0004 u

0000: bl 0020 0020: cmp r0, #0,

0024: beq 0030,

0028: add r0, r0, #1,

002c: b 0020

0030: bx lr

Dominik Klumpp CFG Reconstruction for Assembly 10. September 2019 3 / 3