Conference PaperPDF Available

Concolic program repair

Authors:

Abstract and Figures

Automated program repair reduces the manual effort in fixing program errors. However, existing repair techniques modify a buggy program such that it passes given tests. Such repair techniques do not discriminate between correct patches and patches that overfit the available tests (breaking untested but desired functionality). We propose an integrated approach for detecting and discarding overfitting patches via systematic co-exploration of the patch space and input space. We leverage concolic path exploration to systematically traverse the input space (and generate inputs), while ruling out significant parts of the patch space. Given a long enough time budget, this approach allows a significant reduction in the pool of patch candidates, as shown by our experiments. We implemented our technique in the form of a tool called ‘CPR’ and evaluated its efficacy in reducing the patch space by discarding overfitting patches from a pool of plausible patches. We evaluated our approach for fixing real-world software vulnerabilities and defects, for fixing functionality errors in programs drawn from SV-COMP benchmarks used in software verification, as well as for test-suite guided repair. In our experiments, we observed a patch space reduction due to our concolic exploration of up to 74% for fixing software vulnerabilities and up to 63% for SV-COMP programs. Our technique presents the viewpoint of gradual correctness, repair run over longer time leads to less overfitting fixes.
Content may be subject to copyright.
Concolic Program Repair
Ridwan Sharideen
National University of Singapore
Singapore
ridwan@comp.nus.edu.sg
Yannic Noller
National University of Singapore
Singapore
yannic.noller@acm.org
Lars Grunske
Humboldt-Universität zu Berlin
Germany
grunske@informatik.hu-berlin.de
Abhik Roychoudhury
National University of Singapore
Singapore
abhik@comp.nus.edu.sg
Abstract
Automated program repair reduces the manual eort in x-
ing program errors. However, existing repair techniques
modify a buggy program such that it passes given tests.
Such repair techniques do not discriminate between correct
patches and patches that overt the available tests (breaking
untested but desired functionality). We propose an integrated
approach for detecting and discarding overtting patches via
systematic co-exploration of the patch space and input space.
We leverage concolic path exploration to systematically tra-
verse the input space (and generate inputs), while ruling out
signicant parts of the patch space. Given a long enough
time budget, this approach allows a signicant reduction in
the pool of patch candidates, as shown by our experiments.
We implemented our technique in the form of a tool called
CPR’ and evaluated its ecacy in reducing the patch space
by discarding overtting patches from a pool of plausible
patches. We evaluated our approach for xing real-world
software vulnerabilities and defects, for xing functionality
errors in programs drawn from SV-COMP benchmarks used
in software verication, as well as for test-suite guided repair.
In our experiments, we observed a patch space reduction due
to our concolic exploration of up to 74% for xing software
vulnerabilities and up to 63% for SV-COMP programs. Our
technique presents the viewpoint of gradual correctness
repair run over longer time leads to less overtting xes.
CCS Concepts: Software and its engineering Soft-
ware testing and debugging.
Keywords:
program repair, symbolic execution, program
synthesis, patch overtting
Joint rst authors
PLDI ’21, June 20–25, 2021, Virtual, Canada
©2021 Association for Computing Machinery.
This is the author’s version of the work. It is posted here for your personal
use. Not for redistribution. The denitive Version of Record was published
in Proceedings of the 42nd ACM SIGPLAN International Conference on Pro-
gramming Language Design and Implementation (PLDI ’21), June 20–25, 2021,
Virtual, Canada,hps://doi.org/10.1145/3453483.3454051.
ACM Reference Format:
Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roy-
choudhury. 2021. Concolic Program Repair. In Proceedings of the
42nd ACM SIGPLAN International Conference on Programming Lan-
guage Design and Implementation (PLDI ’21), June 20–25, 2021, Vir-
tual, Canada. ACM, New York, NY, USA, 16 pages. hps://doi.org/
10.1145/3453483.3454051
1 Introduction
Automated Program Repair [
14
,
24
] is an emerging tech-
nology which seeks to rectify errors or vulnerabilities in
software automatically. There are various applications of
automated repair, including improving programmer produc-
tivity, reducing exposure to software security vulnerabilities,
producing self-healing software systems, and even enabling
intelligent tutoring systems for teaching programming.
Since program repair needs to be guided by certain notions
of correctness and formal specications of the program’s
behavior are usually not available, it is common to use test-
suites to guide repair. The goal of automated repair is then
to produce a (minimal) modication of the program so as to
pass the tests in the given test-suite. While test-suite driven
repair provides a practical formulation of the program repair
problem, it gives rise to the phenomenon of “overtting” [
26
,
30
]. The patched program may pass the tests in the given
test-suite while failing tests outside the test-suite, thereby
overtting the test data. Such overtting patches are called
plausible patches because they repair the failing test case(s),
but they are not guaranteed to be correct, since they may
fail tests outside the test-suite guiding the repair. Various
solutions to alleviate the patch overtting issue have been
studied to date, including symbolic specication inference
[
23
,
25
], machine learning-based prioritization of patches
[
2
,
20
,
21
] and fuzzing based test-suite augmentation [
7
].
These works do not guarantee any notion of correctness
of the patches, and cannot guarantee even the most basic
correctness criteria such as crash freedom.
In this work, we reect on the problem of patch overtting
[
22
,
26
,
30
], in our attempt to produce patches which work
for a large number of test inputs. Our goal is to devise an any-
time patching algorithm; the algorithm can be stopped at any
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
time. However, the longer it is run, the greater is the coverage
of the input space, and the greater is our condence that the
patch produced works for a large class of test inputs. To
ensure coverage of the test input space, we use concolic path
exploration for automated test generation. Use of symbolic
and concolic execution for test generation is well-known
[
4
,
9
]; symbolic execution has also been used in automated
repair for computing repair constraints [
25
]. At the same
time, our usage of concolic execution is innovative, and is
the key technical contribution of this paper.
We use concolic execution [
9
] to generate test inputs, and
additionally to generate constraints for the patch renement,
to make them work for those test inputs. We leverage a user-
provided specication to detect incorrect behavior for the
generated test inputs. Such specication does not need to be
a full specication with regard to the program’s correctness.
Partial specications like an assertion at a specic location,
or the absence of crashes in a specic location, can be al-
ready sucient to detect overtting patches. Our outlook
is to use concolic execution for computing path constraints
and patch constraints at the same time. By making the sym-
bolic execution technology serve such a dual purpose, we
can systematically traverse a large portion of the test input
space, and nd out patch patterns which work for those tra-
versed test inputs. Given a longer time budget, we obtain
greater path coverage, and rule out a large number of patch
candidates, thereby reducing overtting in program repair.
Realizing such a dual-purpose usage of symbolic execu-
tion, requires us to overcome many technical challenges. First
our symbolic execution engine needs to compute path con-
straints containing both input variables and patch variables.
Though the patch variables are higher order variables, we
avoid developing a second order symbolic execution engine
for scalability reasons. Instead we provide a rst order encod-
ing of path constraints and patch constraints which contain
(rst order) input variables along with certain additional
parameters to succinctly represent sets of patches. Secondly,
and more importantly, there are additional sources of path
infeasibility as compared to traditional concolic/symbolic
execution, in our setup. In traditional concolic execution, a
path is deemed infeasible if the path constraint is unsatis-
able. In our setup, the path contains a hole for the patch
location, and we maintain a pool of patch candidates which
diminishes as more paths are explored. Hence if none of the
remaining patch candidates can be inserted into the patch
location, we also deem the path as infeasible.
The benets of our concolic approach for patch generation
are shown by the experimental evaluation of its ecacy in
repairing a large set of security vulnerabilities curated in
recent works [
8
] based on Google’s OSS-Fuzz infrastructure.
The tool embodying our concolic program repair approach
........
250 st at ic i nt
251 cv t Ra s te r ( T IF F * tif , u i nt 32 * ra s te r , u in t 32 w id th ,
uint32 height)
252 {
253 ui nt 32 y ;
254 ts tr ip _t s tr ip = 0 ;
255 ts i ze _ t cc , ac c ;
256 un s ig n ed c h ar * bu f ;
257 ui n t3 2 r wi d t h = rou n d up ( w i dt h , h o ri z S u bS a m pl i n g ) ;
258 ui n t3 2 r he i gh t = ro u nd u p ( h ei gh t , v er t S ub S a mp l i ng ) ;
259 ui nt 3 2 nr ow s = (r ow s pe r st r ip > rh e ig h t ?
rh ei g ht : r ow sp e rs t ri p ) ;
260 ui n t3 2 r nr o w s = rou n d up ( n r ow s , v e r t Su b S am p l in g ) ;
261 if (CONDITION ) return 0;
262 /* potential divide-by-zero error */
263 cc = rn ro w s * rw i dt h + 2 * ( ( r nr ow s * r wi d th )
/ (horizSubSampling* vertSubSampling));
........
278 }
Listing 1.
CVE-2016-3623: Divide by Zero in LibTIFF v4.0.6
is called CPR, an abbreviation indicating the resuscitation of
programs via appropriate xes.1
Novelty and Contributions.
Overall, we provide two
key novelties in program repair: (1) the concept of simultane-
ous exploration of input and patch space, (2) alleviate patch
overtting by checking for a user-provided specication dur-
ing concolic exploration. We propose the path exploration
in concolic execution as a mechanism to traverse the pro-
gram input space and patch space simultaneously. The main
contribution is to tackle patch overtting, which is a key
problem in the area of automated program repair [
26
,
30
].
Our repair tool CPR generates correct patches for a variety
of specications or oracles including crash-freedom (absence
of observable vulnerabilities), and satisfaction of assertions
— as shown by our experiments.
2 Illustrative Example
In this section we show the advantages of concolic program
repair by illustrating its usage for the repair of a security vul-
nerability in a real-world application. We make use of the se-
curity vulnerability reported as CVE-2016-3623 discovered in
the LibTIFF library v4.0.6 (see Listing 1). LibTIFF is a popular
open-source library that provides support for the Tag Image
File Format (TIFF), a widely used format for storing image
data. CVE-2016-3623 represents a divide-by-zero vulnerabil-
ity, which allows a remote attacker to cause a denial of ser-
vice by setting malicious inputs to the program
rgb2ycbcr
.
Listing 1depicts the relevant code snippet, which could lead
to a divide-by-zero error at line 263 if the two variables
horizSubSampling
and
vertSubSampling
are not sanitized
for invalid inputs. We have added a x template in line 261,
where the condition can be generated using most state-of-
the-art repair tool.
Repair process.
Concolic program repair works on a high-
level in three phases: (1) patch pool construction, (2) path
1
Resuscitating a program, like what Cardio-pulmonary Resuscitation (CPR)
does to a patient.
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
P1
P2
P3
P4
Input Space Patch Space
P1
P2
P3
P4
P1
P2
P3
P4
P1
P2
P3
P4
Initial test input
x=7, y=0 ID Patch Template Parameter Constraint # Conc. Patches
1 x >= a a ≥ -10 a ≤ 7 18
2 y < b b ≥ 1 b ≤ 10 10
3 x == a || y == b (a=7 b ≥ -10 b ≤ 10)
(b=0 a ≥ -10 a ≤ 10)
41
Patch Details
ID Patch Template Parameter Constraint # Conc. Patches
1 x >= a a ≥ -10 a ≤ 4 15
2 y < b b ≥ 1 b ≤ 10 10
3 x == a || y == b b=0 a ≥ -10 a ≤ 10 21
ID Patch Template Parameter Constraint # Conc. Patches
1 x >= a a ≥ -10 a ≤ 0 11
2 y < b False 0
3 x == a || y == b a = 0 b = 0 1
ID Patch Template Parameter Constraint # Conc. Patches
1 x >= a Fal se 0
3 x == a || y == b a = 0 b = 0 1
P1: x > 3 y ≤ 5 ¬C
P2: x 3 y > 5 ¬C
P3: x 3 y ≤ 5 ¬C
69
46
12
1
correct patch
plausible
patches
P1
P2
P3
P4 P4: x > 3 y > 5 C
1
ID Patch Template Parameter Constraint # Conc. Patches
3 x == a || y == b a = 0 b = 0 1
x= horizSubSampling, y= vertSubSampling, C= CONDITION
Figure 1.
Illustrative concolic exploration for example CVE-2016-3623 in Listing 1as the simultaneous exploration of the
input space and the patch space. The rows I, II, III, IV, and V represent multiple exploration steps. The columns show the
increasingly covered Input Space, the decreasing Patch Space, as well as more details on the identied patches. The patch space
is in general limited by the synthesis language (denoted by the rectangular around the patch space illustration). The number
on the top right of the patch space illustration denotes the total number of concrete patches included in this patch space.
exploration, and (3) patch reduction. The phases (2) and (3)
are performed in an alternating manner: The path explo-
ration provides input partitions (in form of path constraints),
and the patch reduction renes abstract patches and rules
out patches that fail the user-provided specication for the
current input partition.
Illustration.
Figure 1illustrates the simultaneous space
reduction (i.e., the interplay between path exploration and
patch reduction): as we explore the input space, we are able
to narrow down and rene the patch space (steps I, II, III,
and IV), while at the same time we leverage the patch space
to skip parts of the input space, which are not feasible with
the available patches (step V). Therefore, each row I, II, III,
IV, and V in Figure 1represents an exploration step, which
represents an increase of the input space coverage and a po-
tential reduction of the patch space. The input space for this
example is partitioned into 4 compartments P1,P2,P3, and
P4, which are dened by the corresponding path constraints.
Note that the constraints in Figure 1show only the relevant
parts for this example and further assume a control location,
which compares the relevant variables
horizSubSampling
and
vertSubSampling
with the given constants. These path
constraints are chosen articially for this example (since
details of
roundup
are not shown). As mentioned in Figure
1, we refer to
horizSubSampling
and
vertSubSampling
as
𝑥
and
𝑦
respectively as a notational short-hand. Our patch
space is generally limited by the synthesis language (denoted
by the rectangle around the patch space illustration). In or-
der to illustrate the overall reduction in terms of concrete
patches, the box in the top right corner of the patch space
shows the total number of concrete patches included in this
patch space. Please note that Figure 1does not show the
exploration of all possible input partitions, and hence, shows
only a part of the input exploration for illustration purposes.
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
Patch pool construction.
In this example, our approach
starts with synthesizing a set of plausible patches based on
an initial test case with
x=7
,
y=0
(see step I in Figure 1). We
assume that the user-dened specication states that there
should be no divide-by-zero error at line 263 in Listing 1, i.e.,
that
𝑥𝑦
0. The set of plausible patches is shown as the
oval in the Patch Space column. Note that we assume that the
correct patch is included in this set. The table on the right side
of Figure 1shows an illustrative list of patch templates (aka
abstract patches) generated by our synthesizer. As abstract
patches we consider boolean and integer expressions, which
include program variables (e.g.,
𝑥
and
𝑦
) and parameters (e.g.,
𝑎
and
𝑏
). During the repair process, the parameter values
are captured by a certain constraint (see column Parameter
Constraint), which covers a set of concrete patches and limits
the search space. The column # Concr. Patches shows how
many concrete patches are covered by the corresponding
abstract patch. For this illustrative example, we assume that
the parameter values are initially in the range [-10, 10]. The
constraints shown in the table are already modied by the
synthesizer to pass the initial test case. In the following
paragraphs, we will provide more detailed information on
the interplay between path exploration and patch reduction.
Input partition P1 for patch 1.
Starting with the initial
input, concolic execution provides us with the input parti-
tion P1 (dened by the corresponding path constraint). Step
II in Figure 1represents the rst repair iteration. For every
abstract patch, we check whether a violation of the speci-
cation is feasible with the current path constraint. If yes,
we try to rene the constraint on the parameter values. The
light-grey shaded area in the patch space indicates the rene-
ment to the patch space as we explore the respective path of
P1. In order to rene patch 1, we search for models of:
𝑥>3𝑦5∧ ¬(𝑥𝑎) 𝑎∈ [−10,7]
| {z }
path constraint P1 complemented with patch 1
∧ (𝑥𝑦=0)
| {z }
condition for
specication violation
Every satisfying assignment reveals a possibility to violate
the specication with the current path constraint and patch
1. In order to make this formula unsatisable, we need to re-
move the values
{
5
,
6
,
7
}
from the constraint on
𝑎
. Therefore,
the rened variant of patch 1 is:
𝑥𝑎
with
𝑎∈ [−
10
,
4
]
(see
table on the right side of row II in Figure 1). This renement
removes 3 concrete patches from the patch space.
Input partition P1 for patch 2.
In order to test patch
2 on the input partition P1, we again rst check whether
it is possible to violate the specication with the current
path constraint and patch 2. The formula to test would be:
𝑥>
3
𝑦
5
∧ ¬(𝑦<𝑏) ∧ 𝑏∈ [
1
,
10
] ∧ (𝑥𝑦=
0
)
. However,
this formula is unsatisable, and hence, patch 2 cannot be
rened in this step.
Input partition P1 for patch 3.
For patch 3 we need
to test:
𝑥>
3
𝑦
5
∧ ¬(𝑥=𝑎𝑦=𝑏)∧(𝑎=
7
𝑏
[−
10
,
10
]∨𝑏=
0
𝑎∈ [−
10
,
10
])∧(𝑥𝑦=
0
)
. For this formula,
only
𝑦=
0is the feasible condition for a violation. Therefore,
all parameter value combinations, for which
𝑏
0are models
for a specication violation and need to be removed from
the parameter constraint during renement. The resulting
parameter constraint is:
(𝑎=
7
𝑏∈ [
0
] 𝑏=
0
𝑎
[−10,10]), which can be simplied to 𝑏=0𝑎∈ [−10,10].
Exploration of P2 and P3.
In order to generate a new
input, the current path constraint of P1 can be mutated, e.g.,
by ipping constraints in P1 (as in concolic execution), and
solved with an SMT solver. For example, we could retrieve
the input
x=0
,
y=6
corresponding to the path constraint P2:
𝑥
3
𝑦>
5
∧ ¬𝐶
(see step III in Figure 1). While exploring
P2, the parameter constraint in patch 1 can be rened to
𝑎∈ [−
10
,
0
]
. Patch 2 does violate the specication for P2
for all available parameter values. Therefore, patch 2 cannot
be rened and needs to be removed in step III. Finally, the
parameter constraint in patch 3 can be rened to
𝑎=
0
𝑏=
0,
i.e., there is only one concrete mapping left for this patch.
In fact, patch 3 now is semantically equivalent to the correct
patch. Step IV in Figure 1shows one nal step, where patch
1 can be removed and patch 3 remains as the correct patch.
Non-Exploration of P4.
Step V in Figure 1shows the
consideration of P4 with the path constraint
𝑥>
3
𝑦>
5
𝐶
. One of our key ingredients is, when generating a
new input, we ensure the feasibility of the corresponding
path constraint by selecting an appropriate patch from our
patch pool. The above mentioned path constraint for P4
is satisable; however, our approach would not explore it
because there is no patch in the current patch pool, which
would allow taking this path.
Advantages of concolic program repair.
Our approach
has the major advantage to explore both spaces, input and
patch, simultaneously, saving a signicant cost in terms of
time and space enumeration: (1) we rene the patch space
based on the exploration in the input space, while (2) we
also can rule out parts of the input space, which contradicts
with the patch space. We are able to reason about a large
portion of concrete patches with every single iteration of
concolic execution by using abstractions in the patch space.
For example, with three repair steps (II, III, and IV) we can
reduce the patch space by 68 concrete patches. In general, the
more paths we explore, the better the renement would be,
thus nding the most accurate patch. Furthermore, instead of
focusing only on specic inputs but rather on the obtained
path constraint, we are able to test a large portion of the
input space captured by an input partition. Additionally, as
illustrated in our example, our approach performs some path
reduction: during concolic exploration, we make sure that for
every new generated input, there is at least one patch in the
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
current patch pool, which can exercise the corresponding
path. Otherwise, the path will not be explored.
In conclusion, these advantages allow us to reduce the
pool of candidate expressions, as compared to existing state-
of-the-art techniques like counterexample-guided inductive
synthesis (CEGIS) [31,32] and ExtractFix [8].
3 Methodology
In this work we propose a concolic program repair technique,
which incrementally explores the input space, while rening
the patch space.
3.1 Patch Denition
Our technique supports two notions of patches: concrete
and abstract. An abstract patch represents a patch template,
which contains parameters that can have values satisfying a
specied constraint. Concrete patches do not include such
parameters. Our methodology focuses on abstract patches
because, having abstract patches, the repair process needs
to generate and maintain a smaller amount of patch candi-
dates. Furthermore, the patch space reduction can attempt
to rene the parameter constraints before discarding a patch.
Therefore, we dene a patch 𝜌as the 3-tuple
(𝜃𝜌,𝑇𝜌,𝜓𝜌)
with the set of program variables
𝑋𝑃
, the corresponding
subset of input variables
𝑋𝑋𝑃
, and the set of template
parameters 𝐴:
𝜃𝜌(𝑋𝑃, 𝐴)
denotes the repaired (boolean or integer)
expression
𝑇𝜌(𝐴)
represents the conjunction of constraints
𝜏𝜌(𝑎𝑖)
on the parameters 𝑎𝑖𝐴included in 𝜃𝜌:
𝑇𝜌(𝐴)=Û
𝑎𝑖𝐴
𝜏𝜌(𝑎𝑖)
𝜓𝜌(𝑋 , 𝐴)
is the patch formula induced by inserting the
expression 𝜃𝜌into the buggy program
This patch denition covers both notions abstract and con-
crete. For concrete patches the set of parameters
𝐴
is either
empty and
𝑇𝜌
is trivially
True
, or the constraints on the
parameters 𝑎𝑖𝐴allow only one concrete value each.
Example.
Assuming there is a buggy location in a pro-
gram like
if(𝜌)then..else..
, where the patch
𝜌
is in-
cluded in the if condition. Then a repaired expression could
be
𝜃𝜌
:
=𝑥>𝑎
with the parameter value constraint
𝑇𝜌=
𝜏𝜌(𝑎)
:
=(𝑎≥ −
10
𝑎
10) and the corresponding patch
formula 𝜓𝜌:=𝑥>𝑎.
Patch Formula.
In our notation
𝜓𝜌
does not represent
the patch expression but rather the constraint induced by the
patch. For our approach a patch is technically represented
as an expression tree, which can be transformed into an
SMT formula, by considering the semantics of the operators
(or components) appearing in the expression
𝜃𝜌
. The infor-
mation about the patch location (i.e., where the repaired
expression will be inserted) and the transformed expres-
sion tree is what we call the patch formula. Therefore, if the
patch represents the right hand-side of an assignment like
y=𝜌
with
𝜃𝜌
:
=𝑥𝑎
, then the patch formula is derived as
𝜓𝜌
:
=𝑦=𝑥𝑎
, using the patch context information. We
acknowledge that such a patch formula is generally not re-
quired for the denition of a patch. In fact, the patch formula
can be derived from combining the information about the
patch location and the patch expression (see Section 3.5).
However, our approach technically requires such an artifact
in order to reason about the patch.
3.2 Overview: Concolic Repair Algorithm
As input, our approach requires the buggy program, a repair
budget, the fault locations, a user specication, the language
components for the synthesis, and optionally, a set of initial
test cases. The user specication identies a constraint on
the desired program behavior (in addition to satisfying the
given test cases). It does not need to be a complete formal
specication of the correct program behavior, but represents
a constraint on the expected observation, provided as a logi-
cal formula. For example the user can assert crash-freedom
or some specic logical behavior (e.g., a constraint on the
resulting output). If no error-exposing input is available, we
need to generate at least one failing input (with regard to
the user-provided specication) to start the concolic explo-
ration. Therefore, we can use oine techniques like Directed
Greybox Fuzzing [
3
]. Note that the generation of the one
failing test is a pre-processing to our technique. Otherwise,
we assume that at least one failing test is available, which
our method seeks to repair, apart from making sure that the
user-provided specication holds for all paths traversed via
concolic exploration.
As output our approach produces a set of patches, which
satisfy the initial test case (repairing the given failing test
case, if one is available) and which do not violate the given
specication for (a subset of, depending on the repair budget)
the other paths of the program. The patches are ranked based
on the evidence we see during input space exploration.
Algorithm 1shows the general workow of concolic repair,
which implements three phases: (1) patch pool construction
(see Section 3.3), (2) path exploration (see Section 3.4), and
(3) patch reduction (see Section 3.5). The initial phase of syn-
thesis produces a pool of patches
𝑃
(see line 1 in Algorithm
1) by leveraging a component-based synthesizer. This patch
pool is going to be rened in the following repair loop (see
line 2 to 11). The repair loop itself will be continued as long
as there are remaining patches to rene or the repair budget
is not exceeded. In phase (2) (i.e., inside the repair loop), we
pick a new input
𝑡
to explore more program paths (see line
3). With input
𝑡
we also retrieve a patch candidate
𝜌
from
the patch pool
𝑃
, such that inserting
𝜌
in the patch location
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
Algorithm 1: General Concolic Repair
Input: set of initial test cases 𝐼, buggy locations
𝐿=(𝑝𝑎𝑡 𝑐ℎ𝐿𝑜𝑐, 𝑏𝑢 𝑔𝐿𝑜𝑐), budget 𝑏,
specication 𝜎, language components 𝐶
Output: set of ranked patches 𝑃
1PSynthesize(C, I, L)
2while 𝑃and CheckBudget(b) do
3𝑡,𝜌PickNewInput(𝑃)
4if no input 𝑡available then
5return P
6end
7𝜙𝑡,ℎ𝑖𝑡𝑝𝑎𝑡𝑐ℎ ,ℎ𝑖𝑡𝑏𝑢𝑔 ConcolicExec(𝑡,𝜌,𝐿)
8if ℎ𝑖𝑡𝑝𝑎𝑡𝑐ℎ then
9𝑃Reduce(𝑃,𝜙𝑡,𝜎,ℎ𝑖𝑡𝑏𝑢𝑔 )
10 end
11 end
12 return P
allows
𝑡
to have a feasible path in the patched program. If
there is no such input
𝑡
available, then there is no more input
space to explore and the algorithm will return the identied
patches (see line 4 to 6). Otherwise, we perform a concolic
execution of the program with input
𝑡
, patch candidate
𝜌
,
and the information about the:
patch location, where the repair is located and
bug location, where the buggy behavior is
observable
.
It results in the path constraint
𝜙𝑡
and whether the patch
location (
ℎ𝑖𝑡𝑝𝑎𝑡𝑐ℎ
) and the bug location (
ℎ𝑖𝑡𝑏𝑢𝑔
) have been
exercised by the execution (see line 7). Afterwards, in phase
(3), we aim to reduce the patch pool 𝑃based on the current
observations and the given specication
𝜎
. Before calling the
Reduce function in line 9, we check whether the current path
actually exercises the patch location (see line 8), otherwise
there is no reduction possible.
3.3 Phase 1: Patch Pool Construction
In order to generate the initial patch pool
𝑃
we leverage
a component-based synthesizer, which focuses on the syn-
thesis of boolean and integer expressions. Our approach
assumes that the necessary patch-ingredients are provided
as input to our technique. This includes the available pro-
gram variables and the arithmetic/comparison operators for
the synthesis. Before starting the actual synthesis we em-
ploy a controlled symbolic execution [
23
] to retrieve the path
constraints for the initial test cases. Therefore, we mark the
patch variables as symbolic at the patch location. The result
of this symbolic execution is a set of path constraints with
their corresponding expected outputs given by the test cases.
The synthesis starts with generating a set of expression
trees based on the available components and the required
expression type at the patch location. We support the arith-
metic operations
{+,,,/}
as well as the remainder opera-
tion, the comparison operators
{=,,<,,>,}
, the boolean
operators
{∧,,¬}
, and usage of parameters like
{𝑎, 𝑏, 𝑐, ... }
.
More components can be easily added to our synthesizer
by providing them in the SMT-LIB format. For example, for
each program to be repaired, the available variables are pro-
vided as additional components to the synthesizer. The nal
set of expression trees contains all feasible combinations of
the given components that t the required expression type.
Afterwards, the synthesizer enumerates over these trees and
validates that the corresponding expressions repair the pro-
gram for the constraints retrieved by the controlled symbolic
execution. All successfully validated expression trees, will
be put in the resulting patch pool. If the expression tree in-
cludes parameters, the synthesizer will generate a constraint
on these parameters (based on a pre-selected range).
3.4 Phase 2: Path Exploration
The path exploration is concerned with two issues: (a) how
to pick a new input
𝑡
and (2) how to eciently retrieve the
corresponding path constraint
𝜙𝑡
. In the rst loop iteration
the new input is chosen based on the provided test cases
or randomly if there are no test cases available. Afterwards,
based on the previous path constraint, the PickNewInput
function (see line 3 in Algorithm 1) applies generational
search [
10
] to obtain new inputs: by negating every sux
term in the constraint, we can retrieve the maximum number
of new path constraint prexes.
While checking the satisability of the obtained path con-
straint prexes, we also determine whether there exists a
patch candidate
𝜌
in our current patch pool, which allows to
exercise this path. In this way, we prune paths, for which no
patch is feasible. We call this pruning of the input space path
reduction. After checking the satisability, we can generate a
set of new inputs, which are ranked based on how often they
trigger the execution of the patch and bug location. In this
way, a set of new inputs is maintained, which can be worked
on and extended in every repair iteration. The complete path
constraint is then retrieved by concolically executing the
new input, and injecting the patch formula
𝜓𝜌
(for a patch
expression 𝜌) into the path constraint.
3.5 Phase 3: Patch Reduction
The Reduce function in Algorithm 1(see line 9) tries to
shrink the patch pool and to possibly rene the available
abstract patches. Its workow is shown in Algorithm 2.
3.5.1 Criterion for Patch Reduction.
For every patch
𝜌
in the patch pool
𝑃
we need to make sure that there is no
violation of the specication
𝜎
for all inputs that are specied
by the given path constraint. Otherwise, the patch needs to
be removed. More specically, we need to make sure that
there exist parameter values parameters
𝑎𝑖𝐴
within in the
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
Algorithm 2: Reduce function
Input: patch pool 𝑃, path constraint 𝜙, specication
𝜎, bug location hit ℎ𝑖𝑡𝑏𝑢𝑔
Output: reduced patch pool 𝑃
1𝑃𝑃
2for 𝜌𝑃do
3𝜋𝜙(𝑋) 𝜓𝜌(𝑋 , 𝐴) 𝑇𝜌(𝐴)
4if IsSat(𝜋)then
5if ℎ𝑖𝑡𝑏𝑢𝑔 then
6𝑃𝑃\𝜌
7𝑇
𝜌RefinePatch(𝜙,𝜌,𝑇𝜌,𝜎)
8if 𝑇
𝜌.False then
9𝑃𝑃∪ {𝜌with 𝑇
𝜌}
10 end
11 end
12 UpdateRanking(𝜌)
13 end
14 end
15 return 𝑃
constraint
𝑇𝜌(𝐴)
so that for all inputs
𝑥𝑖𝑋
, which satisfy
the path constraint
𝜙(𝑋)
and the patch formula
𝜓𝜌(𝑋 , 𝐴)
,
there is no violation of the specication
𝜎(𝑋)
. Given
𝐴=
{𝑎1, 𝑎2, .., 𝑎𝑛}and 𝑋={𝑥1, 𝑥 2, .., 𝑥𝑚}, this means:
𝑎1, 𝑎2, . ., 𝑎𝑛𝑥1, 𝑥2, . ., 𝑥𝑚:
𝜙(𝑋) 𝜓𝜌(𝑋 , 𝐴) 𝑇𝜌(𝐴)=𝜎(𝑋)(1)
In our approach we do not only ensure that there exists
one
value for each parameters
𝑎𝑖
, but we iteratively rene
the constraint
𝑇𝜌(𝐴)
to reduce the patch space as much as
possible and to ensure that the specication holds for
all
(rened) values for each parameter 𝑎𝑖:
𝑎1, 𝑎2, .., 𝑎𝑛𝑥1, 𝑥2, . ., 𝑥𝑚:
𝜙(𝑋) 𝜓𝜌(𝑋 , 𝐴) 𝑇𝜌(𝐴)=𝜎(𝑋)(2)
We want this formula (2) to hold after renement, and
hence it is used to guide our abstract patch renement.
3.5.2 Reduction Algorithm.
Algorithm 2describes the
reduction function for abstract patches. The function iterates
over every patch and searches for specication violations.
Before calling the patch renement in line 7, there are two
additional pre-checks, to make sure that we can reason about
the patch within the current path constraint. First we check
whether the path constraint
𝜙
and the current patch
𝜌
(see
line 3 and 4) are feasible. Secondly, we check whether the
bug location is exercised by the current execution (see line
5) so that the buggy behavior is observable.
If both checks are passed, then we investigate whether the
patch
𝜌
with constraint
𝑇𝜌
needs to be rened by searching
for counterexamples for formula (2). The only option for the
patch renement, based on our denition of abstract patches
(see Section 3.1), is to rene the constraint
𝑇𝜌
. The imple-
mentation details for the patch renement are presented in
Section 4. If no renement is feasible, then the patch will be
eventually removed.
3.5.3 Patch Ranking.
In addition to reducing the patch
space, our approach attempts to rank the remaining patches.
The rank of each patch
𝜌
will be increased as long the patch
is feasible with the path constraint
𝜙
(see line 12 in Algo-
rithm 2). Otherwise the ranking will be not modied because
we cannot reason about the patch with regard to the current
path constraint. If the path exercises the bug location, then
the patch will be ranked additionally higher (as compared to
the situation where it does not exercise the bug location). In-
tuitively, this means that (1) patches that are compatible with
the current path constraint will be ranked higher because
we have seen more evidence for their correctness (in terms
of the explored input space). In addition, (2) patches that also
exercise the bug location will be ranked even higher because
they exercised the program location, where potential errors
are observable. Patches that are compatible with the path
constraint and do not exercise the bug location could still be
erroneous, but there has been no possibility to observe the
error. We only rank those patches which do not show any
violation of the specication for the explored input space.
In addition, we deprioritize patches that change the pro-
gram behavior signicantly, specically deletion of function-
ality — which can happen if the guard of a conditional state-
ment is changed by a patch to tautologies or their negation.
Based on our formula (2) we cannot remove these patches
because they do not violate the specication. However, func-
tionality deletion is in general not desirable; as stated in a
recent study [
26
], this kind of functionality deleting patches
are present in the earlier works on search-based program
repair and are overtting. Although we cannot remove these
patches, our patch ranking mechanism deprioritizes them.
Therefore, for all patch candidates, we check whether the
insertion of the patch aects the control ow of the inputs
owing through the path (even if the insertion of the patch
does not violate the user-provided specication). We deprior-
itize such patches, and increase the rank of the other patches,
and this ranking ne-tuning is accumulated over all the paths
explored. Further ne-tuning of this heuristic is possible via
model counting [
5
,
11
] to nd the proportion of inputs in a
path aected by a patch insertion.
4 Abstract Patch Renement
During patch space reduction (see Algorithm 2) we try to
rene the available abstract patches whenever we identify a
corresponding violation of specication
𝜎
. This is achieved
by eciently rening the parameter constraint
𝑇𝜌
of the
abstract patch 𝜌as shown in Algorithm 3.
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
Removal of non-renable constraints.
Before starting
the ne-grained renement of
𝑇𝜌
, the Algorithm 3checks
whether there is a renement of
𝑇𝜌
feasible, which will make
the specication pass. It checks whether (a) the conjunction
of the path constraint with the specication (see formula
𝜔𝑝𝑎𝑠𝑠 1
in line 1) is satisable, followed by the check whether
(b) the conjunction of the path constraint with the current
patch constraint still allows to pass the specication (see
formula
𝜔𝑝𝑎𝑠𝑠 2
in line 3). If (a) is satisable, but (b) is unsat-
isable, the parameter constraint does not contain any value
that repairs the specication violation, and hence, can be
discarded completely.
Counterexample exploration.
After these initial checks,
the algorithm searches counterexamples for the general for-
mula (2) from Section 3.5.1 (see formula
𝜔𝑓 𝑎𝑖𝑙
in line 8). They
capture violations of the specication, which need to be ex-
cluded by our renement of
𝑇𝜌
. If there exists no such model
for formula
𝜔𝑓 𝑎𝑖𝑙
, then the parameter constraint needs no fur-
ther renement and the current constraint can be returned
(see line 31). But if there is a model
𝑚𝐴
, the Split function
removes the model from the current constraint
𝑇𝜌
and splits
it into multiple regions (see line 11).
Region representation.
We assume that the parameter
constraint can be split into
𝑘
regions
𝑅={𝑟1, 𝑟2, . .., 𝑟𝑘}
so
that the constraint represents the disjunction of the separate
regions. This limits the search space during renement and
can lead to removal of regions, which do not satisfy the
specication. For example, consider a parameter space with
one parameter
𝑎
and the constraint
𝑇𝜌(𝑎)
:
=(𝑙𝑎)∧(𝑎𝑢)
.
Having the counterexample
𝑚𝑎
, the Split function replaces
the existing region with two new regions:
𝑟1:=(𝑙𝑎)∧(𝑎𝑚𝑎1)
𝑟2:=(𝑚𝑎+1𝑎)∧(𝑎𝑢)
Even if
𝑇𝜌
already consists of multiple regions, only one
region will be aected by the removal of the counterexample.
In general there will be 3
𝑛
1additional regions introduced
(where
𝑛
is the number of parameters), while some of them
might be merged later with surrounding regions.
Recursive renement.
The algorithm further checks for
specication violations (see line 16 to 26) by recursively call-
ing the renement function on the regions (see line 19). Each
recursive call is guarded by a check whether the current
region
𝑟𝑖
is compatible with the path constraint
𝜙
and the
current patch formula (see line 17 and 18). Otherwise we can-
not reason about the region. After iterating over all regions,
the algorithm attempts to merge contiguous regions (see
line 27), and nally, returns the disjunction of the rened
parameter regions (see line 28).
Algorithm 3: RefinePatch function
Input:
path constraint
𝜙
, abstract patch
𝜌
, parameter
constraint 𝑇𝜌, specication 𝜎
Output: rened constraint 𝑇
𝜌
1𝜔𝑝𝑎𝑠𝑠 1𝜙(𝑋) ∧ 𝜎(𝑋)
2if IsSat(𝜔𝑝𝑎𝑠𝑠 1)then
3𝜔𝑝𝑎𝑠𝑠 2𝜙(𝑋) 𝜓𝜌(𝑋 , 𝐴) 𝑇𝜌(𝐴) 𝜎(𝑋)
4if ¬IsSat(𝜔𝑝𝑎𝑠𝑠 2)then
5return False
6end
7end
8𝜔𝑓 𝑎𝑖𝑙 𝜙(𝑋) 𝜓𝜌(𝑋 , 𝐴) 𝑇𝜌(𝐴) ∧ ¬𝜎(𝑋)
9𝑚𝐴GetModel(𝜔𝑓 𝑎𝑖𝑙 )
10 if 𝑚exists then
11 𝑅={𝑟1, 𝑟2, . .,𝑟 𝑘} ← Split(𝑇𝜌,𝑚𝐴)
12 if 𝑅=then
13 return False
14 else
15 𝑅← {}
16 for 𝑟𝑖𝑅do
17 𝜋𝜙(𝑋) 𝜓𝜌(𝑋 , 𝐴) ∧ 𝑟𝑖(𝐴)
18 if IsSat(𝜋)then
19 𝑟
𝑖RefinePatch(𝜙,𝜌,𝑟𝑖,𝜎)
20 if 𝑟
𝑖.False then
21 𝑅𝑅∪ {𝑟
𝑖}
22 end
23 else
24 𝑅𝑅∪ {𝑟𝑖}
25 end
26 end
27 𝑅Merge(𝑅)
28 return Ô
𝑟
𝑖𝑅
𝑟
𝑖
29 end
30 else
31 return 𝑇𝜌
32 end
5 Evaluation
The goal of our work is to eciently navigate the patch
space and nd the correct patch that works beyond the pro-
vided test suite. We compare our technique with the related
counterexample-guided inductive synthesis (CEGIS) [
31
,
32
]
because it also can be employed to navigate the patch space
via patch renement in order to generate the correct patch.
Note that the above proposed technique of concolic program
repair is not tailored to a specic class of errors. However, the
low dependence on existing test cases ts well the context
of repairing security vulnerabilities. Therefore, we present
an empirical comparison with the state-of-the-art program
repair tools Angelix [
23
], and Prophet [
21
], and also the
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
recently proposed tool ExtractFix [
8
] for repairing security
vulnerabilities. To highlight CPR’s general repair capabilities,
we also include additional subjects from the ManyBugs [
13
]
benchmark. Furthermore, we show CPR’s ability to x logi-
cal errors for subjects from the SV-COMP benchmark [
33
].
All experimental data, as well as the open-source CPR tool,
are available from: hps://cpr-tool.github.io/
Benchmark Suite.
ExtractFix [
8
] is a state-of-the-art
vulnerability repair tool, which generates xes for security
vulnerabilities by computing a crash-free constraint using
a sanitizer. The crash-free constraint is used as the oracle
for patch generation, and in our case, it can serve as the
program specication. We follow a dierent workow by
rst synthesizing patches at a given fault location and then
gradually improving them based on a concolic exploration.
We use their benchmark, which includes real-world appli-
cations with reported security vulnerabilities, and hence, it
can be used to evaluate the ecacy of our technique in re-
pairing security vulnerabilities. The collected subjects from
the ManyBugs [
13
] benchmark show a partial subset of pro-
grams that can be handled with our underlying concolic
engine KLEE [
4
]. Most of these subjects represent general er-
rors. SV-COMP [
33
] is a common benchmark for evaluating
the eectiveness and eciency of state-of-the-art verica-
tion techniques. We identied C programs from SV-COMP,
which include reachable assertion errors and for which there
is another program in the benchmark, which represents a
repaired version (i.e., the assertion is present but the error is
not reachable), while the repair is not just a modication of
the assertion’s condition, but a logical change in the program
before the assertion is reached. For our experiments, we have
chosen 10 programs that satisfy the stated conditions.
Experimental Setup.
Our implementation of the con-
colic engine is an extension of KLEE [
4
]. All experiments are
conducted on a Dell Power Edge R530 with Intel(R) Xeon(R)
CPU E5-2660 processor and 64GB RAM. We use Docker con-
tainers to exploit and repair the vulnerable applications. The
experiments have been executed with the timeout of 1 hour
to match the experiments of ExtractFix [
8
], allowing com-
parison with other repair tools. The language components
for the synthesis are selected as needed for the specic sub-
ject and the parameters for the abstract patches have been
limited to be within the range [-10,10]. For each experiment,
(at least) one failing test case is provided as the initial test
case. For subjects in the ExtractFix benchmark the fail-
ing test case is the exploit. For subjects in the ManyBugs
benchmark there are multiple failing and passing test cases,
while we provide CPR only the failing test cases. For subjects
in SV-COMP we manually generate a failing test to trigger
assertion errors. For ExtractFix and ManyBugs, we derive
simple specications from the programs themselves, e.g., that
a program should not return an erroneous status code. The
specication for the SV-COMP subjects is directly extracted
based on the included assertions. For our experiments, the
fault locations have been provided manually to CPR.
Our CEGIS Implementation.
CEGIS comes in various
forms in existing works [
1
,
31
,
32
]. We implement our own
custom version of CEGIS with regard to the concepts in [
32
]
by reusing as much components as possible from our tool
CPR so that we can enable a fair comparison between the con-
cepts with minimized impact of implementation dierences.
More specically, our CEGIS implementation reuses CPR’s
concolic engine to provide a common path exploration for
both techniques and reuses CPR’s synthesizer to explore the
same patch space. This custom CEGIS implementation sup-
ports the patch generation using a counterexample-guided
renement of the synthesis constraint. It starts with a con-
colic exploration of the input space to construct a set of
path constraints. Afterwards, we synthesize a patch for the
derived constraints (i.e., user-provided specication and wit-
nessed program paths in previous concolic exploration). We
then verify if the synthesized patch can produce a counterex-
ample such that the specication is violated. If a counterex-
ample can be found, the current patch will be thrown away,
and the counterexample model is added to the synthesis con-
straint. The synthesizer will generate a new patch and the
iteration continues until there is no further counterexample,
or the patch space is covered.
It is necessary to limit the concolic exploration of CEGIS
to make the techniques comparable. In our experiments, we
split the overall timeout of 1 hour for CEGIS into 30 minutes
initial path exploration and 30 minutes patch renement.
The conceptual dierence between CEGIS and CPR is that
CEGIS explores the patch space and input space one patch
/ one input at a time, while CPR explores partitions in both
the patch space and the input space.
5.1 Our CEGIS Implementation
Table 1shows the results of the comparison between the
two techniques. Column
𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠
indicates the number
of language components passed to our synthesizer. The sub
columns
𝐺𝑒𝑛𝑒𝑟𝑎𝑙
and
𝐶𝑢𝑠𝑡𝑜𝑚
represent the number of com-
ponents from the general synthesis language and number of
custom components created specically for the respective
test subject. Columns
|𝑃𝐼𝑛𝑖𝑡 |
and
|𝑃𝐹𝑖 𝑛𝑎𝑙 |
show the number of
patches in the plausible patch space at the start of the rene-
ment and at the end respectively. CEGIS does not maintain
a patch pool like CPR, but only generates one patch that sat-
ises the collected constraints. However, the current patch
pool size can be calculated by instructing the synthesizer to
produce all currently feasible patches.
|𝑃𝐼𝑛𝑖𝑡 |
is for CEGIS
the same as for CPR because we share the same inputs and
synthesizer. Column
𝑅𝑎𝑡𝑖𝑜
shows the percentage of the patch
space reduction. Column
𝜙𝐸
indicates the number of program
paths
e
xplored for the renement. Column
𝜙𝑆
indicates the
number of program paths
s
kipped during the renement due
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
Table 1.
Comparison between our CEGIS implementation and CPR with regard to patch pool reduction ratio and input space
reduction ratio. Benchmark: ExtractFix. The experiments have been executed with timeout of 1 hour.
ID Buggy Program Components Our CEGIS Implementation CPR
Project Bug ID General Custom |𝑃𝐼𝑛𝑖𝑡 | |𝑃𝐹 𝑖𝑛𝑎𝑙 |Ratio 𝜙𝐸Correct? |𝑃𝐼𝑛𝑖𝑡 | |𝑃𝐹𝑖𝑛𝑎𝑙 |Ratio 𝜙𝐸𝜙𝑆Rank
1 Libti CVE-2016-5321 2 3 174 174 0 % 17 174 104 40% 67 77 2
2 Libti CVE-2014-8128 4 3 260 260 0% 0 260 260 0% 0 0 1
3 Libti CVE-2016-3186 4 3 130 130 0% 13 130 130 0% 13 1 11
4 Libti CVE-2016-5314 4 4 199 198 1% 10 199 197 1% 21 4 2
5 Libti CVE-2016-9273 4 3 260 260 0% 5 260 141 46% 10 2 8
6 Libti bugzilla 2633 4 3 130 130 0% 66 130 130 0% 109 21 8
7 Libti CVE-2016-10094 4 3 130 130 0% 23 130 77 41% 34 114 6
8 Libti CVE-2017-7601 4 2 94 94 0% 27 94 94 0% 78 107 2
9 Libti CVE-2016-3623 4 3 130 130 0% 60 130 100 23% 102 21 1
10 Libti CVE-2017-7595 4 3 130 130 0% 10 130 130 0% 18 31 1
11 Libti bugzilla 2611 4 3 130 130 0% 61 130 112 14% 87 15 1
12 Binutils CVE-2018-10372 5 3 74 74 0% 9 74 39 47% 25 1 33
13 Binutils CVE-2017-15025 4 3 130 130 0% 0 130 130 0% 0 0 6
14 Libxml2 CVE-2016-1834 4 3 260 260 0% 6 260 260 0% 22 0 12
15 Libxml2 CVE-2016-1838 4 4 199 199 0% 4 199 199 0% 4 0 10
16 Libxml2 CVE-2016-1839 5 3 65 65 0% 0 65 65 0% 0 0 14
17 Libxml2 CVE-2012-5134 4 3 260 260 0% 44 260 134 48% 80 271 7
18 Libxml2 CVE-2017-5969 4 3 260 260 0% 0 260 154 41% 21 2 1
19 Libjpeg CVE-2018-14498 4 3 260 260 0% 42 260 128 51% 78 108 2
20 Libjpeg CVE-2018-19664 4 3 130 130 0% 43 130 130 0% 84 26 1
21 Libjpeg CVE-2017-15232 5 3 955 955 0% 0 955 955 0% 0 0 26
22 Libjpeg CVE-2012-2806 4 3 260 259 0% 68 260 145 44% 110 3 3
23 FFmpeg CVE-2017-9992 6 3 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
24 FFmpeg Bugzilla-1404 4 2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
25 Jasper CVE-2016-8691 4 3 260 260 0% 72 260 96 63% 69 7 1
26 Jasper CVE-2016-9387 5 3 65 65 0% 54 65 17 74% 111 1
27 Coreutils Bugzilla 26545 5 3 1025 1025 0% 74 1025 949 7% 119 2 25
28 Coreutils GNUBug 25003 4 4 199 198 1% 114 199 172 14% 196 0 6
29 Coreutils GNUBug 25023 4 2 64 64 0% 32 64 64 0% 1 2 7
30 Coreutils Bugzilla 19784 4 3 - - - - - 770 770 0% 6 0 38
Table 2.
Comparison with repair tools. The experiments have been executed with timeout of 1 hour [
8
]. For Prophet and
Angelix the results show only the top-ranked patch, while for ExtractFix the results capture the only patch generated.
Benchmark Program #Vul Generated Patches Correct Patches
Prophet Angelix ExtractFix Prophet Angelix ExtractFix
ExtractFix
Libti 11 7 7 9 1 0 6
Binutils 2 - - 2 - - 1
Libxml2 5 3 0 4 0 0 2
Libjpeg 4 3 - 3 1 - 2
FFmpeg 2 - - 2 - - 2
Jasper 2 2 2 2 0 0 1
Coreutils 4 2 - 2 0 - 2
Total 30 17 9 24 2 0 16
to patch in-feasibility. Column
𝐶𝑜𝑟 𝑟𝑒𝑐𝑡
?indicates whether
CEGIS nishes with a patch that is syntactically or semanti-
cally equivalent with the developer patch and column
𝑅𝑎𝑛𝑘
shows the corresponding highest rank position. The
𝑁/𝐴
values for ID 23 and 24 in Table 1indicate that both CEGIS
and CPR have not been able to produce any results because
the execution of the test driver code resulted in an unex-
pected memory fault for our underlying concolic execution
engine. The "-" signs for CEGIS for ID 30 mean that it was
not able to generate any patch within the timeout.
Input and patch space exploration.
The comparison of
the
𝑅𝑎𝑡𝑖𝑜
columns in Table 1shows that in 14 of 30 cases CPR
can produce signicantly better patch space reduction than
CEGIS. In the remaining 16 cases, both perform similarly. For
a few subjects, CPR resulted in 0% reduction, partly because
of the loop unrolling (and hence longer paths) in symbolic
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
Table 3.
Performance of CPR with regard to patch pool reduction ratio and input space reduction ratio for additional subjects
from the ManyBugs benchmark. The experiments have been executed with timeout of 1 hour.
ID Buggy Program Components CPR
Project Subject ID General Custom |𝑃𝐼𝑛𝑖𝑡 | |𝑃𝐹𝑖 𝑛𝑎𝑙 |Ratio 𝜙𝐸𝜙𝑆Rank
1 Libti ee65c74 4 3 6 6 0% 29 90 1
2 Libti 865f7b2 4 3 130 130 0% 24 68 5
3 Libti 7d6e298 5 4 4 2 50% 7 7 1
4 gzip 884ef6d16c 5 4 4821 4821 0% 11 0 36
5 gzip f17cbd13a1 5 4 2 2 0% 0 1 1
execution. While this is an area we can work on, the
𝜙𝑆
column shows that CPR is already eective in combating
path explosion by skipping additional paths over and above
normal concolic execution. For all subjects, for which CPR
produces some patch space reduction > 1%, it outperforms
CEGIS. Furthermore, the
𝜙𝐸
columns show that CPR is also
more ecient in exploring the input space: in 20 of 30 cases
CPR explores more path constraints than CEGIS, in 2 cases
CEGIS shows better results, and for the remaining 8 cases
both perform similarly. Additionally, CPR can eectively
skip infeasible path constraints (see Column 𝜙𝑆).
Furthermore, CEGIS requires initial path exploration to
construct the constraint for later patch verication. There-
fore, in order to verify a patch, CEGIS uses a set of symbolic
paths that capture portion of the program specication. In
contrast, our technique CPR is an anytime algorithm that
uses a single program path at a time for patch renement.
Processing a single path at a time, compared to a set of paths
is more ecient during constraint solving.
Finding 1:
CPR is more eective than CEGIS with regard
to input space and patch space exploration.
Identifying the correct patch.
In none of our 30 test sub-
jects CEGIS can identify a patch, which is syntactically or
semantically equivalent with the developer patch (see Col-
umn
𝐶𝑜𝑟 𝑟𝑒𝑐𝑡
?). The reason is that as soon as CEGIS identies
a patch, which does not violate the specication for the pre-
viously collected path constraints, it terminates and returns
this current patch. In our experiments, such a patch often
is a tautology or contradiction, which can be semantically
equivalent to code deletion, as the patch would enforce early
termination of the program to avoid the bug location. CPR
includes such patches in the patch space (as long as they
do not violate any specication), but our ranking system
de-prioritizes such patches (see Section 3.5.3). Column
𝑅𝑎𝑛𝑘
shows that CPR ranks the developer patch (or a semantic
equivalent) relatively high, in 20 cases in the Top-10.
Finding 2: CEGIS tends to favor a simple patch that rep-
resents the deletion of functionality, which overts to the
given specication. CPR can leverage its ranking capabili-
ties to identify the correct patch.
5.2 Existing Program Repair Tools
CPR can be leveraged for constraint-driven repair, i.e., hav-
ing just a few or no test cases, but a constraint, which can be
used as a repair oracle. For this purpose, we focus on the com-
parison with the most recently proposed constraint-driven
repair technique ExtractFix [
8
] and their corresponding
data-set. On the data-set of ExtractFix,CPR generates the
correct patch in top position for 7/30 subjects and in second
position in 4/30 subjects, as shown in Table 1.
As already mentioned, ExtractFix uses a crash-free con-
straint as the guiding oracle to generate a patch. Extract-
Fix computes the weakest precondition for the patch by
back propagating the crash-free constraint. Conceptually,
ExtractFix explores the patch space using the crash-free
constraint to determine the patch and then evaluates the ef-
fectiveness of the patch for the input space. In contrast, CPR
can use the same crash-free constraint but explores the input
space to determine the invalid values that can violate the
crash-free constraint, and use this information to evaluate
the eectiveness of the patch. The tool ExtractFix is also
compared with conventional test-based repair tools Prophet
and Angelix in [8].
Table 2from [
8
] shows the results on the same security
vulnerability benchmark. Column #
𝑉𝑢𝑙
shows the count of
vulnerabilities for each subject, which is in total 30. The
columns Generated Patches and Correct Patches show the
number of vulnerabilities, for which the techniques gen-
erated plausible and correct patches (i.e., syntactically or
semantically equivalent to the developer patch).
Overall, we note that ExtractFix is a customized tool for
repairing security vulnerabilities which hooks into specic
sanitizers, whereas ours is a general-purpose program repair
machinery. Table 3shows the results from test-based repair
of Manybugs subjects [
13
] that require a general-purpose
repair technique; these cannot be handled by ExtractFix.
CPR can generate correct patches for all of them, by lever-
aging the failing tests to drive concolic path exploration. In
future, it is also possible to experimentally evaluate the usage
of passing tests to drive concolic exploration in CPR.
Since Prophet and Angelix are test-driven general repair
techniques, in addition to the failing test case, available devel-
oper test-suite are provided to both Angelix and Prophet
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
Table 4.
Performance of CPR with regard to patch pool reduction ratio and input space reduction ratio for the repair of logical
errors in SV-COMP. The experiments have been executed with timeout of 1 hour.
ID Subject Components CPR
General Custom |𝑃𝐼𝑛𝑖𝑡 | |𝑃𝐹 𝑖𝑛𝑎𝑙 |Ratio 𝜙𝐸𝜙𝑆Rank
1 loops/insertion_sort 4 3 260 132 49% 120 0 1
2 loops/linear_search 4 3 260 127 51% 109 17 1
3 loops/string 2 3 676 676 0% 37 0 2
4 loops/eureka 5 3 29 29 0% 107 27 3
5 loops-crafted-1/nested_delay 4 3 260 117 55% 9 8 4
6 loops/sum 4 3 260 236 9% 116 0 1
7 array-examples/bubble_sort 4 3 260 144 45% 34 19 2
8 array-examples/unique_list 1 2 5 4 20% 134 11 1
9 array-examples/standard_run 4 3 260 126 52% 68 41 1
10 recursive/addition 5 3 38 14 63% 138 1 4
(the programs in Table 2come with test-suites from devel-
opers). ExtractFix and CPR do not need additional tests.
Angelix and Prophet.
In contrast to our approach, Ex-
tractFix is driven only by the initial test case while Angelix
and Prophet both uses additional developer test cases. De-
spite being provided additional test cases, both Angelix and
Prophet cannot produce many correct patches. Prophet
can only identify correct patches for 2 of the vulnerabilities
and Angelix is not able to correctly x any of them, as the
top-ranked patch. Most of the correct patches represent up-
dated or inserted conditions, which are in the search space of
both techniques. However, as mentioned in ExtractFix [
8
],
the developer-provided tests for this benchmark are very
limited, which may lead to overtting patches. Therefore,
Angelix cannot generate a rich specication for synthesis,
and Prophet suers from a large search space. Prophet and
Angelix have the potential to repair more vulnerabilities
if more tests are available, and if more of their ranking is
examined, i.e., beyond the top-ranked patch.
Finding 3:
Experimental evidence shows CPR can be
used as test-guided general-purpose repair tool, as well as
a tool for repairing security vulnerabilities.
5.3 Fixing Logical Errors
We further evaluate CPR on its capability to repair logical
errors of a program provided as assertions or rich-text com-
ments on the source code. Therefore, we investigate the
possibility of repairing programs beyond simple oracles such
as crash-freedom. We evaluate the ecacy of CPR in xing
logical errors on subjects from the SV-COMP benchmark,
which is popular for automated program verication and
provides such program specications. As mentioned earlier,
for our chosen SV-COMP programs the developer provided
patch is available in the form of another program (so we
can check whether CPR produced the correct patch), and
the developer provided patch is not merely a change of the
assertion but involves a change in the functionality.
Table 4presents the results. The meaning of the columns
is similar to Table 1in Section 5.1. For all subjects, CPR
can identify correct patches in the patch pool. Furthermore,
due to the ecient space exploration, CPR achieves a patch
space reduction ratio of up to 63 %. Only for one subject
(
𝑙𝑜𝑜 𝑝𝑠 /𝑒𝑢𝑟𝑒𝑘𝑎
)CPR was not able to produce any patch space
reduction. The reason is that the assertion in the program
was not strong enough to identify violations. However, CPR
still has been able to rank the correct patch on position 3.
In fact, for all of the 10 subjects CPR can rank the correct
patches in the Top-10 and for ve of them as Top-1.
Finding 4:
CPR eectively repairs logical errors in SV-
COMP, and ranks correct patches in Top-10 for all pro-
grams in our experiments.
5.4 Internal Evaluation of CPR Components
Parameter Range.
As mentioned in our Experimental
Setup section, the parameter for the abstract patches in our
experiments are limited within the range [-10, 10]. We con-
ducted additional experiments to show the eects of other
ranges. The results in Table 5show that the number of initial
patch candidates (
|𝑃𝐼𝑛𝑖𝑡 |
) is growing with a larger parameter
range. The eort for the initial patch pool construction is not
largely aected because the concrete values for the param-
eters are not enumerated but abstracted in the range. The
ranking of the correct patch itself is not necessarily aected
as our experiments show. For Jasper/CVE-2016-8691 the cor-
rect patch is correctly identied after the rst iteration. For
Libti/CVE-2016-10094 the parameter range needs to include
the constant 4so that CPR can identify the correct patch.
With a too narrow range like
[−
1
,
1
]
CPR cannot identify
the correct patch.
Input Generation.
The additional generation of inputs
is an essential part of our path exploration phase (see Section
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
Table 5.
Impact of dierent parameter ranges on the repair success of CPR. Benchmark: selection of ExtractFix. The
experiments have been executed with timeout of 1 hour.
Buggy Program Parameter CPR
Project Bug ID Range #𝐼 𝑡𝑒𝑟 . 𝜙𝐸|𝑃𝐼 𝑛𝑖𝑡 | |𝑃𝐹𝑖𝑛𝑎𝑙 |Ratio Rank
Jasper CVE-2016-8691
[-1, 1] 70 68 44 15 66% 1
[-10, 10] 70 69 260 96 63% 1
[-100, 100] 70 79 2420 907 63% 1
Libti CVE-2016-10094
[-1, 1] 35 34 22 10 55% -
[-10, 10] 35 34 130 77 41% 6
[-100, 100] 27 26 1210 887 27% 6
Table 6.
Average ratio of the number of generated inputs
that hit the patch and bug location.
Benchmark Avg. PatchLoc Hit Avg. BugLoc Hit
ExtractFix 74.36% 40.23%
ManyBugs 57.14% 65.15%
SV-COMP 76.33% 79.08%
3.4). Our search heuristics drive the input generation to the
bug location. Hitting the bug location is crucial, not only to
rule out patches, but also to improve the patch ranking. Table
6shows how often our generated inputs hit the patch and bug
location on average. The results show that to a large extent
our generated inputs do exercise the patch and bug location.
However, for the ExtractFix benchmark hit count for the
bug location is comparably low with 40.23%. In contrast to
the SV-COMP subjects, where the inputs represent primitive
data types, the ExtractFix subjects require complex input
structures like images or XML les. Our input generation
does not use an application-specic input grammar, which
could lead to a signicant improvement.
Patch Ranking.
The changes in our ranking are based
on whether the generated inputs exercise the patch and bug
location under the specic patches. For many subjects the
ranking of the correct patch is already very high after the rst
few iterations, and is not changed later. Our path exploration
starts with inputs that exercise paths that are close to the path
of the failing test case: hitting the bug location is more likely
for those inputs. In some subjects, the ranking improved
gradually over the repair time, e.g. Coreutils/Bugzilla 26545
starts with the correct patch ranked at position 104 and it
improves to 25 (after 65
𝑡ℎ
iteration). Change in ranking can
happen due to patch candidates violating specication in the
new paths.
6 Related Work
Symbolic Execution.
Symbolic execution, the execution
of a program with symbolic or unknown inputs, was sug-
gested in 1976 as a mechanism for both program verication
and testing [
16
]. In the subsequent decades, decision pro-
cedures for quantier-free rst order logic formula with
symbols drawn from various background theories, or Satis-
ability Modulo Theory (SMT) solvers, have matured. The
maturity of back-end SMT solvers has further enabled the
development of symbolic execution engines such as KLEE
[
4
] and SAGE [
10
]. These symbolic execution engines are pri-
marily used for path coverage based software testing. How-
ever, the ecient solving of constraints in general remains
a challenge for symbolic execution. Concolic execution [
9
]
represents a signicant development in this regard. In con-
colic execution, a given concrete test input is executed but
the symbolic formula documenting the path condition is
mutated to generate subsequent test inputs for exploration.
Since a concrete input is available, the path condition can be
simplied as needed. In the recent past, symbolic execution
has also been suggested as a specication inference mecha-
nism for program repair (e.g., [
25
]), and this suers from the
path explosion problem of symbolic execution. Furthermore,
the repair is with respect to a given set of tests, leading to
potential overtting. Our work on concolic program repair
adapts concolic path exploration to generate tests and reduce
candidate patches simultaneously.
Program Repair.
Automated program repair [
24
] is an
emerging technology, which seeks to automatically rectify
program errors, typically as observed via failure of tests or
assertions. Common techniques for automated repair include
program mutations via genetic search [
18
], specication in-
ference via symbolic execution or SAT solving [
12
,
23
,
25
],
repair via abstract interpretation [19], code transplantation
[
28
], and learning and prioritization of patch candidates and
x patterns [
2
,
20
,
21
,
27
]. Our work is more related to speci-
cation inference based program repair. These approaches
employ symbolic execution to generate a repair constraint,
which the buggy program needs to satisfy to pass a given
test-suite. Solutions to the repair constraint, in the form of
patch expressions, are then obtained using program syn-
thesis. Most of the existing works on test-based program
repair suer from test data overtting, where the patched
program fails for tests outside the given test-suite [
14
,
26
].
To alleviate overtting, one may use more general oracles
beyond tests [
6
], or may generate tests to rule out overt-
ting patches [
7
]. Certain works develop customized repair
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
strategies for xing security vulnerabilities by either em-
ploying heuristics [
15
], by applying x templates that avoid
specic errors [
29
], or by hooking up with sanitizers [
8
]. In
contrast, ours is a general purpose repair engine, though we
have also shown its ecacy on the dataset of [
8
]. Our work
generates tests from an initial seed test by modifying the
path condition, in the style of concolic execution. However,
the path of a test contains yet to be inserted patches. Hence
the path exploration in concolic execution is accompanied
by a systematic reduction of the pool of patch candidates in
our approach. Finally, counterexample-guided inductive syn-
thesis (CEGIS) [
1
,
31
,
32
] represents a synthesis technique,
in which the desired solution is iteratively rened based
on a loop between a generator and a verier. Our approach
also leverages counterexamples to reduce the patch space,
and has some relationship to CEGIS. In our work, we use
a counterexample-guided renement of the parameter con-
straints of the available patches. The work of [
17
] performs
concolic execution on specic tests to check whether a patch
candidate meets a specication; if it does not, the resultant
constraint is added for the generation of future repair candi-
dates. In contrast, CPR works on abstract patch candidates
and renes them. Furthermore, [
17
] terminates as soon as
there is no counterexample anymore, which again can lead
to functionality deleting patches.
7 Discussion
Limitations and Extensions.
In the formulation of our
repair algorithm, as well as in our experiments, we assume
that the correct patch is included in the initial patch pool
𝑃. This is only the case, if our synthesis language/grammar
covers this patch. In general, this assumption might not hold.
In such a case, our ranking allows us to still present the most
promising patches, which can only repair the program for a
portion of the input space. Our approach currently focuses
on repairing boolean and integer expressions. In future we
want to extend our work to repair complete assignments as
well as side-eect free function calls.
Inputs to our method.
Our approach requires some in-
gredients that diers from existing program repair strate-
gies: the user-provided (partial) specication and the fault
locations (see the input description in Section 3.2). The spec-
ication allows us to reason about many program inputs
going beyond a test suite. Other techniques rely on bug tem-
plates, sanitizers, existing test cases, or probabilistic models
to reason about the correct behavior. Our specications are
lightweight, and our experiments show that even simple
specications can be used to rule out overtting patches in
an incremental manner. The fault location information is an
input to our approach, which can be derived from statistical
fault localization. Test-based repair tools may use a set of
fault locations, while our approach currently works with one
fault location at a time.
8 Perspective
A key diculty in program repair (and program debugging)
comes from the lack of complete specication of intended
program behavior. Since a detailed specication of correct
behavior is usually not available, existing program repair
techniques are guided by tests. This inevitably leads to the
pernicious problem of patch overtting [
26
], where an auto-
matically generated (plausible) patch may be perfectly tted
to pass a given set of tests, but not other tests. Herein lies the
dilemma of program repair techniques today: how to gener-
ate a patch which works for a large set of tests, even if very
few of them may be available to guide the patch generation?
In this paper, we take a fresh look at the problem of pro-
gram repair. We note that the patches produced by current
program repair techniques may not even ensure very basic
notions of correctness such as crash-freedom, or assertions,
even when such simple specications are readily available.
Our solution for alleviating the patch overtting problem,
is to automatically and systematically generate tests. Our
concolic exploration identies overtting patches that are
plausible but do not satisfy the specication for at least one
of the generated inputs. Furthermore, by removing incorrect
but plausible patches we shrink the patch space and increase
the ranking of the correct patch, alleviating patch overtting.
Our CPR tool also applies to test-suite based repair, by using
failing / passing tests to drive concolic path exploration.
Technically, our approach suggests a dual use of symbolic
execution for search-based test generation [
4
,
9
], and for
specication inference based program repair [
23
,
25
]. One
could potentially replace symbolic execution with other au-
tomated test generation techniques in our method, such as
recent systematic versions of greybox fuzzing [3].
Conceptually, we present a viewpoint of “gradual cor-
rectness” to alleviate patch overtting, where systematic co-
exploration of the input space and patch space, leads to less
over-tting patches, over time. This notion of gradual cor-
rectness, as proposed for program repair in CPR, can also be
meaningful for program synthesis, recovery and transplan-
tation. Gradual correctness can thus help us produce high
quality automatically constructed code.
Our open-source tool and all data are publicly accessible:
hps://cpr-tool.github.io
hps://doi.org/10.5281/zenodo.4668317
Acknowledgments
We thank Sergey Mechtaev for valuable discussions on patch
synthesis, and help with implementation. We thank the
anonymous reviewers and our shepherd Martin Rinard for
insightful suggestions. This research is partially supported
by the National Research Foundation Singapore (National
Satellite of Excellence in Trustworthy Software Systems) and
by the German Research Foundation (GR 3634/6-1 FLASH).
Concolic Program Repair PLDI ’21, June 20–25, 2021, Virtual, Canada
References
[1]
Rajeev Alur, Rishabh Singh, Dana Fisman, and Armando Solar-Lezama.
2018. Search-Based Program Synthesis. Commun. ACM 61, 12 (Nov.
2018), 84–93. hps://doi.org/10.1145/3208071
[2]
Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra.
2019. Getax: learning to x bugs automatically. Proc. ACM Program.
Lang. 3, OOPSLA (2019), 159:1–159:27. hps://doi.org/10.1145/3360585
[3]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik
Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of
the 2017 ACM SIGSAC Conference on Computer and Communications
Security (Dallas, Texas, USA) (CCS ’17). Association for Computing
Machinery, New York, NY, USA, 2329–2344. hps://doi.org/10.1145/
3133956.3134020
[4]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unas-
sisted and Automatic Generation of High-Coverage Tests for Complex
Systems Programs. In Proceedings of the 8th USENIX Conference on
Operating Systems Design and Implementation (San Diego, California)
(OSDI’08). USENIX Association, USA, 209–224.
[5]
Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi. 2013.
A Scalable Approximate Model Counter. In Principles and Practice
of Constraint Programming - 19th International Conference, CP 2013,
Uppsala, Sweden, September 16-20, 2013. Proceedings (Lecture Notes in
Computer Science), Christian Schulte (Ed.), Vol. 8124. Springer, 200–216.
hps://doi.org/10.1007/978-3- 642-40627- 0_18
[6]
Hadar Frenkel, Orna Grumberg, Corina Pasareanu, and Sarai Sheinvald.
2020. Assume, Guarantee or Repair. In Tools and Algorithms for the
Construction and Analysis of Systems, Armin Biere and David Parker
(Eds.). Springer International Publishing, Cham, 211–227. hps://doi.
org/10.1007/978-3- 030-45190- 5_12
[7]
Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-
Avoiding Program Repair. In Proceedings of the 28th ACM SIGSOFT
International Symposium on Software Testing and Analysis (ISSTA) (Bei-
jing, China) (ISSTA 2019). Association for Computing Machinery, New
York, NY, USA, 8–18. hps://doi.org/10.1145/3293882.3330558
[8]
Xiang Gao, Bo Wang, Gregory J. Duck, Ruyi Ji, Yingfei Xiong, and Ab-
hik Roychoudhury. 2021. Beyond Tests: Program Vulnerability Repair
via Crash Constraint Extraction. ACM Trans. Softw. Eng. Methodol. 30,
2, Article 14 (Feb. 2021), 27 pages. hps://doi.org/10.1145/3418461
[9]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Di-
rected Automated Random Testing. In Proceedings of the 2005 ACM
SIGPLAN Conference on Programming Language Design and Implemen-
tation (PLDI) (Chicago, IL, USA) (PLDI ’05). Association for Computing
Machinery, New York, NY, USA, 213–223. hps://doi.org/10.1145/
1065010.1065036
[10]
Patrice Godefroid, Michael Y Levin, and David Molnar. 2012. SAGE:
Whitebox Fuzzing for Security Testing. Commun. ACM 55, 3 (mar
2012), 40–44. hps://doi.org/10.1145/2093548.2093564
[11]
Carla P. Gomes, Ashish Sabharwal, and Bart Selman. 2009. Model
Counting. In Handbook of Satisability, Armin Biere, Marijn Heule,
Hans van Maaren, and Toby Walsh (Eds.). Frontiers in Articial In-
telligence and Applications, Vol. 185. IOS Press, 633–654. hps:
//doi.org/10.3233/978-1- 58603-929- 5-633
[12]
Divya Gopinath, Muhammad Zubair Malik, and Sarfraz Khurshid. 2011.
Specication-Based Program Repair Using SAT. In Tools and Algorithms
for the Construction and Analysis of Systems (TACAS), Parosh Aziz
Abdulla and K. Rustan M. Leino (Eds.). Springer Berlin Heidelberg,
Berlin, Heidelberg, 173–188.
[13]
Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun,
Premkumar T. Devanbu, Stephanie Forrest, and Westley Weimer. 2015.
The ManyBugs and IntroClass Benchmarks for Automated Repair
of C Programs. IEEE Trans. Software Eng. 41, 12 (2015), 1236–1256.
hps://doi.org/10.1109/TSE.2015.2454513
[14]
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Au-
tomated Program Repair. Commun. ACM 62, 12 (Nov. 2019), 56–65.
hps://doi.org/10.1145/3318162
[15]
Zhen Huang, David Lie, Gang Tan, and Trent Jaeger. 2019. Using Safety
Properties to Generate Vulnerability Patches. In 2019 IEEE Symposium
on Security and Privacy (SP). 539–554. hps://doi.org/10.1109/SP.2019.
00071
[16]
James C. King. 1976. Symbolic Execution and Program Testing. Com-
mun. ACM 19, 7 (July 1976), 385–394. hps://doi.org/10.1145/360248.
360252
[17]
Robert Könighofer and Roderick Bloem. 2013. Repair with On-The-
Fly Program Analysis. In Hardware and Software: Verication and
Testing, Armin Biere, Amir Nahir, and Tanja Vos (Eds.). Springer Berlin
Heidelberg, Berlin, Heidelberg, 56–71.
[18]
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley
Weimer. 2012. GenProg: A Generic Method for Automatic Software
Repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54–72.
hps://doi.org/10.1109/TSE.2011.104
[19]
Francesco Logozzo and Thomas Ball. 2012. Modular and Veried
Automatic Program Repair. In Proceedings of the ACM International
Conference on Object Oriented Programming Systems Languages and
Applications (Tucson, Arizona, USA) (OOPSLA ’12). Association for
Computing Machinery, New York, NY, USA, 133–146. hps://doi.org/
10.1145/2384616.2384626
[20]
Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic Inference
of Code Transforms for Patch Generation. In Proceedings of the 2017
11th Joint Meeting on Foundations of Software Engineering (Paderborn,
Germany) (ESEC/FSE 2017). Association for Computing Machinery,
New York, NY, USA, 727–739. hps://doi.org/10.1145/3106237.3106253
[21]
Fan Long and Martin Rinard. 2016. Automatic Patch Generation
by Learning Correct Code. In Proceedings of the 43rd Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages
(POPL) (St. Petersburg, FL, USA) (POPL ’16). Association for Comput-
ing Machinery, New York, NY, USA, 298–312. hps://doi.org/10.1145/
2837614.2837617
[22]
Fan Long and Martin C. Rinard. 2016. An analysis of the search spaces
for generate and validate patch generation systems. In Proceedings of
the 38th International Conference on Software Engineering, ICSE 2016,
Austin, TX, USA, May 14-22, 2016, Laura K. Dillon, Willem Visser, and
Laurie A. Williams (Eds.). ACM, 702–713. hps://doi.org/10.1145/
2884781.2884872
[23]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix:
Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In
Proceedings of the 38th International Conference on Software Engineering
(Austin, Texas) (ICSE ’16). Association for Computing Machinery, New
York, NY, USA, 691–701. hps://doi.org/10.1145/2884781.2884807
[24]
Martin Monperrus. 2018. Automatic Software Repair: A Bibliography.
ACM Comput. Surv. 51, 1, Article 17 (Jan. 2018), 24 pages. hps:
//doi.org/10.1145/3105906
[25]
Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and
Satish Chandra. 2013. SemFix: program repair via semantic analysis.
In 35th International Conference on Software Engineering, ICSE ’13,
San Francisco, CA, USA, May 18-26, 2013, David Notkin, Betty H. C.
Cheng, and Klaus Pohl (Eds.). IEEE Computer Society, 772–781. hps:
//doi.org/10.1109/ICSE.2013.6606623
[26]
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An
Analysis of Patch Plausibility and Correctness for Generate-and-
Validate Patch Generation Systems. In Proceedings of the 2015 In-
ternational Symposium on Software Testing and Analysis (Baltimore,
MD, USA) (ISSTA 2015). ACM, New York, NY, USA, 24–36. hps:
//doi.org/10.1145/2771783.2771791
[27]
Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley
Weimer, and Ranjit Jhala. 2020. Type error feedback via analytic
program repair. In Proceedings of the 41st ACM SIGPLAN International
Conference on Programming Language Design and Implementation, PLDI
2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina
PLDI ’21, June 20–25, 2021, Virtual, Canada Ridwan Sharideen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury
Torlak (Eds.). ACM, 16–30. hps://doi.org/10.1145/3385412.3386005
[28]
Stelios Sidiroglou-Douskos, Eric Lahtinen, Fan Long, and Martin Ri-
nard. 2015. Automatic Error Elimination by Horizontal Code Transfer
across Multiple Applications. SIGPLAN Not. 50, 6 (June 2015), 43–54.
hps://doi.org/10.1145/2813885.2737988
[29]
Stelios Sidiroglou-Douskos, Eric Lahtinen, and Martin Rinard. 2015.
Automatic Discovery and Patching of Buer and Integer Overow Errors.
Technical Report. Massachusetts Institute of Technology, Cambridge,
MA, USA. hp://hdl.handle.net/1721.1/97087
[30]
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015.
Is the cure worse than the disease? overtting in automated program
repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of
Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - Sep-
tember 4, 2015, Elisabetta Di Nitto, Mark Harman, and Patrick Heymans
(Eds.). ACM, 532–543. hps://doi.org/10.1145/2786805.2786825
[31]
Armando Solar-Lezama. 2008. Program Synthesis by Sketching. Ph.D.
Dissertation. EECS Department, University of California, Berkeley.
[32]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia,
and Vijay Saraswat. 2006. Combinatorial sketching for nite programs.
In International Conference on Architectural Support for Programming
Languages and Operating Systems - ASPLOS.hps://doi.org/10.1145/
1168857.1168907
[33]
SV-COMP Website. 2020. International Competition on Software
Verication (SV-COMP). hps://sv-comp.sosy- lab.org/.
... Further, the constraint for memory writes and function calls [102], the weakest precondition, the extension of crash-free constraints [9] could also be computed by symbolic execution. To achieve better path coverage, concolic execution [157], which employs concrete input to drive symbolic execution, is also utilized to traverse the path driven by path constraints [99]. ...
... Constraint-Based C/C++ / / Memfix [93] http://prl.korea.ac.kr/MemFix Constraint-Based C/C++ / / CPR [99] https://cpr-tool.github.io/ Constraint-Based C/C++ ExtractFix [9] https://extractfix.github.io/ ...
Preprint
Full-text available
The increasing prevalence of software vulnerabilities necessitates automated vulnerability repair (AVR) techniques. This Systematization of Knowledge (SoK) provides a comprehensive overview of the AVR landscape, encompassing both synthetic and real-world vulnerabilities. Through a systematic literature review and quantitative benchmarking across diverse datasets, methods, and strategies, we establish a taxonomy of existing AVR methodologies, categorizing them into template-guided, search-based, constraint-based, and learning-driven approaches. We evaluate the strengths and limitations of these approaches, highlighting common challenges and practical implications. Our comprehensive analysis of existing AVR methods reveals a diverse landscape with no single ``best'' approach. Learning-based methods excel in specific scenarios but lack complete program understanding, and both learning and non-learning methods face challenges with complex vulnerabilities. Additionally, we identify emerging trends and propose future research directions to advance the field of AVR. This SoK serves as a valuable resource for researchers and practitioners, offering a structured understanding of the current state-of-the-art and guiding future research and development in this critical domain.
... A wide range of analysis-based approaches have been explored to repair software vulnerabilities [11,24,26,37,45]. GenProg [24] is one of the pioneering tools in APR. ...
Preprint
Full-text available
Critical open source software systems undergo significant validation in the form of lengthy fuzz campaigns. The fuzz campaigns typically conduct a biased random search over the domain of program inputs, to find inputs which crash the software system. Such fuzzing is useful to enhance the security of software systems in general since even closed source software may use open source components. Hence testing open source software is of paramount importance. Currently OSS-Fuzz is the most significant and widely used infrastructure for continuous validation of open source systems. Unfortunately even though OSS-Fuzz has identified more than 10,000 vulnerabilities across 1000 or more software projects, the detected vulnerabilities may remain unpatched, as vulnerability fixing is often manual in practice. In this work, we rely on the recent progress in Large Language Model (LLM) agents for autonomous program improvement including bug fixing. We customise the well-known AutoCodeRover agent for fixing security vulnerabilities. This is because LLM agents like AutoCodeRover fix bugs from issue descriptions via code search. Instead for security patching, we rely on the test execution of the exploit input to extract code elements relevant to the fix. Our experience with OSS-Fuzz vulnerability data shows that LLM agent autonomy is useful for successful security patching, as opposed to approaches like Agentless where the control flow is fixed. More importantly our findings show that we cannot measure quality of patches by code similarity of the patch with reference codes (as in CodeBLEU scores used in VulMaster), since patches with high CodeBLEU scores still fail to pass given the given exploit input. Our findings indicate that security patch correctness needs to consider dynamic attributes like test executions as opposed to relying of standard text/code similarity metrics.
Article
Deep learning (DL) components have been broadly applied in diverse applications. Similar to traditional software engineering, effective test case generation methods are needed by industry to enhance the quality and robustness of these deep learning components. To this end, we propose a novel automatic software testing technique, TAEFuzz (Automatic Fuzz -Testing via T ransferable A dversarial E xamples), which aims to automatically assess and enhance the robustness of image-based deep learning (DL) systems based on test cases generated by transferable adversarial examples. TAEFuzz alleviates the over-fitting problem during optimized test case generation and prevents test cases from prematurely falling into local optima. In addition, TAEFuzz enhances the visual quality of test cases through constraining perturbations inserted into sensitive areas of the images. For a system with low robustness, TAEFuzz trains a low-cost denoising module to reduce the impact of perturbations in transferable adversarial examples on the system. Experimental results demonstrate that the test cases generated by TAEFuzz can discover up to 46.1% more errors in the targeted systems, and ensure the visual quality of test cases. Compared to existing techniques, TAEFuzz also enhances the robustness of the target systems against transferable adversarial examples with the perturbation denoising module.
Article
Given the increasing adoption of modern AI-enabled control systems, ensuring their safety and reliability has become a critical task in software testing. One prevalent approach to testing control systems is falsification, which aims to find an input signal that causes the control system to violate a formal safety specification using optimization algorithms. However, applying falsification to AI-enabled control systems poses two significant challenges: (1) it requires the system to execute numerous candidate test inputs, which can be time-consuming, particularly for systems with AI models that have many parameters, and (2) multiple safety requirements are typically defined as a conjunctive specification, which is difficult for existing falsification approaches to comprehensively cover. This paper introduces Synthify , a falsification framework tailored for AI-enabled control systems, i.e., control systems equipped with AI controllers. Our approach performs falsification in a two-phase process. At the start, Synthify synthesizes a program that implements one or a few linear controllers to serve as a proxy for the AI controller. This proxy program mimics the AI controller's functionality but is computationally more efficient. Then, Synthify employs the ϵ\epsilon -greedy strategy to sample a promising sub-specification from the conjunctive safety specification. It then uses a Simulated Annealing-based falsification algorithm to find violations of the sampled sub-specification for the control system. To evaluate Synthify , we compare it to PSY-TaLiRo , a state-of-the-art and industrial-strength falsification tool, on 8 publicly available control systems. On average, Synthify achieves a 83.5% higher success rate in falsification compared to PSY-TaLiRo with the same budget of falsification trials. Additionally, our method is 12.8 ×\times faster in finding a single safety violation than the baseline. The safety violations found by Synthify are also more diverse than those found by PSY-TaLiRo , covering 137.7% more sub-specifications.
Article
Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance.
Article
Security vulnerabilities detected via techniques like greybox fuzzing are often fixed with a significant time lag. This increases the exposure of the software to vulnerabilities. Automated fixing of vulnerabilities where a tool can generate fix suggestions is thus of value. In this work, we present such a tool, called CrashRepair , to automatically generate fix suggestions using concolic execution, specification inference, and search techniques. Our approach avoids generating fix suggestions merely at the crash location because such fixes often disable the manifestation of the error instead of fixing the error. Instead, based on sanitizer-guided concolic execution, we infer desired constraints at specific program locations and then opportunistically search for code mutations that help respect those constraints. Our technique only requires a single detected vulnerability or exploit as input; it does not require any user-provided properties. Evaluation results on a wide variety of CVEs in the VulnLoc benchmark, show CrashRepair achieves greater efficacy than state-of-the-art vulnerability repair tools like Senx. The repairs suggested come in the form of a ranked set of patches at different locations, and we show that on most occasions, the desired fix is among the top-3 fixes reported by CrashRepair .
Article
This work introduces EffFix, a tool that applies a novel static analysis-driven Automated Program Repair (APR) technique for fixing memory errors. APR tools typically rely on a given test-suite to guide the repair process. Apart from the need to provide test oracles, this reliance is also one of the main contributors to the over-fitting problem. Static analysis based APR techniques bypass these issues only to introduce new ones, such as soundness, scalability, and generalizability. This work demonstrates how we can overcome these challenges and achieve sound memory bug repair at scale by leveraging static analysis (specifically Incorrectness Separation Logic – ISL) to guide repair. This is the first repair approach to use ISL. Our key insight is that the abstract domain used by static analysis to detect the bugs also contains key information to derive correct patches. Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug based on the feedback from ISL based static analysis (specifically the Pulse analyzer), and turning this information into a distribution of probabilities over context free grammars. This approach to repair is generic in that its learning strategy allows for finding patches without relying on the commonly used patch templates. Furthermore, to achieve efficient program repair, instead of focusing on heuristics for reducing the search space of patches, we make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap. We then conduct candidate patch validation only once per patch equivalence class. This allows EffFix to efficiently discover quality repairs even in the presence of a large pool of patch candidates. Experimental evaluation of fixing real world memory errors in medium to large scale subjects like OpenSSL, Linux Kernel, swoole, shows the efficiency and effectiveness of EffFix — in terms of automatically producing repairs from large search spaces. In particular, EffFix has a fix ratio of 66% for memory leak bugs and 83% for Null Pointer Dereferences for the considered dataset.
Article
Most automated program repair methods rely on test cases to determine the correctness of the generated patches. However, due to the incompleteness of available test suites, some patches that pass all the test cases may still be incorrect. This issue is known as the patch overfitting problem. Overfitting problem is a longstanding problem in automated program repair. Due to overfitting patches, the patches obtained by automated program repair tools require further validation to determine their correctness. Researchers have proposed many methods to automatically assess the correctness of patches, but no systematic review provides a detailed introduction to this problem, the existing solutions, and the challenges. To address this deficiency, we systematically review the existing approaches to patch correctness assessment. We first offer a few examples of overfitting patches to acquire a more detailed understanding of this problem. We then propose a comprehensive categorization of publicly available techniques and datasets, examine the commonly used evaluation metrics, and perform an in-depth analysis of the effectiveness of the existing models in addressing the challenge of overfitting. Based on our analysis, we provided the difficulties encountered by current methodologies, alongside the possible avenues for future research exploration.
Chapter
Full-text available
We present Assume-Guarantee-Repair (AGR) – a novel framework which not only verifies that a program satisfies a set of properties, but also repairs the program in case the verification fails. We consider communicating programs – these are simple C-like programs, extended with synchronous communication actions over communication channels. Our method, which consists of a learning-based approach to assume-guarantee reasoning, performs verification and repair simultaneously: in every iteration, AGR either makes another step towards proving that the (current) system satisfies the specification, or alters the system in a way that brings it closer to satisfying the specification. We manage handling infinite-state systems by using a finite abstract representation, and reduce the semantic problems in hand – satisfying complex specifications that also contain first-order constraints – to syntactic ones, namely membership and equivalence queries for regular languages. We implemented our algorithm and evaluated it on various examples. Our experiments present compact proofs of correctness and quick repairs.
Article
Full-text available
Static analyzers help find bugs early by warning about recurring bug categories. While fixing these bugs still remains a mostly manual task in practice, we observe that fixes for a specific bug category often are repetitive. This paper addresses the problem of automatically fixing instances of common bugs by learning from past fixes. We present Getafix, an approach that produces human-like fixes while being fast enough to suggest fixes in time proportional to the amount of time needed to obtain static analysis results in the first place. Getafix is based on a novel hierarchical clustering algorithm that summarizes fix patterns into a hierarchy ranging from general to specific patterns. Instead of an expensive exploration of a potentially large space of candidate fixes, Getafix uses a simple yet effective ranking technique that uses the context of a code change to select the most appropriate fix for a given bug. Our evaluation applies Getafix to 1,268 bug fixes for six bug categories reported by popular static analyzers for Java, including null dereferences, incorrect API calls, and misuses of particular language constructs. The approach predicts exactly the human-written fix as the top-most suggestion between 12% and 91% of the time, depending on the bug category. The top-5 suggestions contain fixes for 526 of the 1,268 bugs. Moreover, we report on deploying the approach within Facebook, where it contributes to the reliability of software used by billions of people. To the best of our knowledge, Getafix is the first industrially-deployed automated bug-fixing tool that learns fix patterns from past, human-written fixes to produce human-like fixes.
Conference Paper
Full-text available
Security vulnerabilities are among the most critical software defects in existence. When identified, programmers aim to produce patches that prevent the vulnerability as quickly as possible, motivating the need for automatic program repair (APR) methods to generate patches automatically. Unfortunately , most current APR methods fall short because they approximate the properties necessary to prevent the vulnerability using examples. Approximations result in patches that either do not fix the vulnerability comprehensively, or may even introduce new bugs. Instead, we propose property-based APR, which uses human-specified, program-independent and vulnerability-specific safety properties to derive source code patches for security vulnerabilities. Unlike properties that are approximated by observing the execution of test cases, such safety properties are precise and complete. The primary challenge lies in mapping such safety properties into source code patches that can be instantiated into an existing program. To address these challenges, we propose Senx, which, given a set of safety properties and a single input that triggers the vulnerability, detects the safety property violated by the vulnerability input and generates a corresponding patch that enforces the safety property and thus, removes the vulnerability. Senx solves several challenges with property-based APR: it identifies the program expressions and variables that must be evaluated to check safety properties and identifies the program scopes where they can be evaluated, it generates new code to selectively compute the values it needs if calling existing program code would cause unwanted side effects, and it uses a novel access range analysis technique to avoid placing patches inside loops where it could incur performance overhead. Our evaluation shows that the patches generated by Senx successfully fix 32 of 42 real-world vulnerabilities from 11 applications including various tools or libraries for manipulating graphics/media files, a programming language interpreter, a relational database engine, a collection of programming tools for creating and managing binary programs, and a collection of basic file, shell, and text manipulation tools.
Article
Automated program repair is an emerging technology that seeks to automatically rectify program errors and vulnerabilities. Repair techniques are driven by a correctness criterion that is often in the form of a test suite. Such test-based repair may produce overfitting patches, where the patches produced fail on tests outside the test suite driving the repair. In this work, we present a repair method that fixes program vulnerabilities without the need for a voluminous test suite. Given a vulnerability as evidenced by an exploit, the technique extracts a constraint representing the vulnerability with the help of sanitizers. The extracted constraint serves as a proof obligation that our synthesized patch should satisfy. The proof obligation is met by propagating the extracted constraint to locations that are deemed to be “suitable” fix locations. An implementation of our approach (E xtract F ix ) on top of the KLEE symbolic execution engine shows its efficacy in fixing a wide range of vulnerabilities taken from the ManyBugs benchmark, real-world CVEs and Google’s OSS-Fuzz framework. We believe that our work presents a way forward for the overfitting problem in program repair by generalizing observable hazards/vulnerabilities (as constraint) from a single failing test or exploit.
Article
Automated program repair can relieve programmers from the burden of manually fixing the ever-increasing number of programming mistakes.
Conference Paper
Existing program repair systems modify a buggy program so that the modified program passes given tests. The repaired program may not satisfy even the most basic notion of correctness, namely crash-freedom. In other words, repair tools might generate patches which over-fit the test data driving the repair, and the automatically repaired programs may even introduce crashes or vulnerabilities. We propose an integrated approach for detecting and discarding crashing patches. Our approach fuses test and patch generation into a single process, in which patches are generated with the objective of passing existing tests, and new tests are generated with the objective of filtering out over-fitted patches by distinguishing candidate patches in terms of behavior. We use crash-freedom as the oracle to discard patch candidates which crash on the new tests. In its core, our approach defines a grey-box fuzzing strategy that gives higher priority to new tests that separate patches behaving equivalently on existing tests. This test generation strategy identifies semantic differences between patch candidates, and reduces over-fitting in program repair. We evaluated our approach on real-world vulnerabilities and open-source subjects from the Google OSS-Fuzz infrastructure. We found that our tool Fix2Fit (implementing patch space directed test generation), produces crash-avoiding patches. While we do not give formal guarantees about crash-freedom, cross-validation with fuzzing tools and their sanitizers provides greater confidence about the crash-freedom of our suggested patches.
Article
Writing programs that are both correct and efficient is challenging. A potential solution lies in program synthesis aimed at automatic derivation of an executable implementation (the “how”) from a high-level logical specification of the desired input-to-output behavior (the “what”). A mature synthesis technology can have a transformative impact on programmer productivity by liberating the programmer from low-level coding details. For instance, for the classical computational problem of sorting a list of numbers, the programmer has to simply specify that given an input array A of n numbers, compute an output array B consisting of exactly the same numbers as A such that B[i] ≤ B[i + 1] for 1 ≤ i < n, leaving it to th synthesizer to figure out the sequence of steps needed for the desired computation. Traditionally, program synthesis is formalized as a problem in deductive theorem proving:¹⁷ A program is derived from the constructive proof of the theorem that states that for all inputs, there exists an output, such that the desired correctness specification holds. Building automated and scalable tools to solve this problem has proved to be difficult. A recent alternative to formalizing synthesis allows the programmer to supplement the logical specification with a syntactic template that constrains the space of allowed implementations and the solution strategies focus on search algorithms for efficiently exploring this space. The resulting search-based program synthesis paradigm is emerging as an enabling technology for both designing more intuitive programming notations and aggressive program optimizations. © 2018 Association for Computing Machinery. All Rights Reserved.
Conference Paper
Existing Greybox Fuzzers (GF) cannot be effectively directed, for instance, towards problematic changes or patches, towards critical system calls or dangerous locations, or towards functions in the stack-trace of a reported vulnerability that we wish to reproduce. In this paper, we introduce Directed Greybox Fuzzing (DGF) which generates inputs with the objective of reaching a given set of target program locations efficiently. We develop and evaluate a simulated annealing-based power schedule that gradually assigns more energy to seeds that are closer to the target locations while reducing energy for seeds that are further away. Experiments with our implementation AFLGo demonstrate that DGF outperforms both directed symbolic-execution-based whitebox fuzzing and undirected greybox fuzzing. We show applications of DGF to patch testing and crash reproduction, and discuss the integration of AFLGo into Google's continuous fuzzing platform OSS-Fuzz. Due to its directedness, AFLGo could find 39 bugs in several well-fuzzed, security-critical projects like LibXML2. 17 CVEs were assigned.
Conference Paper
We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects. To the best of our knowledge, Genesis is the first system to automatically infer patch generation transforms or candidate patch search spaces from previous successful patches.