Some Notes on Software Diversiﬁcation and Test
Ampliﬁcation Using Machine Learning Methods
Abstract—The application of machine learning methods has
proven to be a successful approach for managing a wide variety
of computer science problems. The aim of this technical report
is to present some ideas related to the analysis of source code
and software systems using machine learning techniques. In
particular, we focus our study on its applications to (1) software
diversiﬁcation and (2) automatic test ampliﬁcation. Due to the
nature of this two problems, machine learning methods are
suitable tools to leverage its development. We review part of the
existing literature and discuss new ideas regarding these issues,
which could serve as a starting point towards further research
on both ﬁelds.
Keywords—machine learning, software diversiﬁcation, automatic
Machine learning refers to the detection of meaningful
patterns in data . For many real scenarios, due to the
complexity of the patterns that need to be detected, a human
programmer cannot provide an explicit, ﬁne detailed speciﬁ-
cation of how such tasks should be executed. Taking example
from intelligent beings, many of our skills are acquired or
reﬁned through learning from our experience (rather than
following explicit instructions given to us). Machine learning
methods are concerned with endowing programs with the
ability to “learn” and adapt from the environment.
Machine learning provides the technical basis of data min-
ing, in which has been widely used to extract information
from the raw data present in databases . The goal is to
construct models that adapt to the changes in the system
and infer valuable information from data for some speciﬁc
purpose. Thus, it involves the implementation of algorithms
that can obtain useful knowledge from data, even structured
data, without relying on static rules–based programming.
The kind of knowledge obtained can be used for prediction,
explanation, or understanding of the data. The learning process
can be guided in three different ways: supervised, semi-
supervised and unsupervised . Supervised learning deals
with approximating a target function from labeled examples
(e.g., lazy learning, decision trees, bayesian learning, neural
networks, support verctor machines). Unsupervised learning
attempts to learn patterns and associations from a set of
objects that do not have attached class labels (e.g., cluster-
ing, association rules). Semi-supervised learning consists in
learning from a combination of labeled and unlabeled exam-
ples (e.g., expectation–maximization with generative mixture
models, self–training, co–training, transductive support vector
In the past couple of decades it has become a common tool
in almost any task that requires information extraction from
large datasets. Machine learning algorithms have proven to be
of great practical value in a variety of application domains such
as natural language processing, pattern recognition in images
or surveillance videos, detection of web spam, genetics and
genomics, etc. The ﬁeld of software engineering turns out to
be a fertile ground where many software development and
maintenance tasks could be formulated as learning problems
and approached in terms of learning algorithms .
Machine learning algorithms can give us insights into soft-
ware processes and products, such as what software modules
are most likely to contain bugs , what amount of effort
is likely to be required to develop new software projects ,
what commits are most likely to induce crashes , how the
productivity of a company changes over time, how to improve
productivity , etc.
We see a wealth of opportunities in this research area.
Recently, the study of software diversiﬁcation has emerged
as an active research ﬁeld in software engineering . In this
report, we address the problem of diversifying software sys-
tems and present some ideas about it using machine learning
paradigms. Furthermore, we study the application of machine
learning techniques to the problem of the automatic generation
of testing cases  that aims to enhance the coverage of tests
cases with respect to a well-deﬁned engineering goal.
This technical report is organized as follows. Section II
reviews the problem of software diversiﬁcation and discusses
some ideas regarding to the application of machine learning
concepts in this area. Section III summarizes the automatic test
ampliﬁcation procedure and gives some research insights to the
use of machine learning in the search-based test ampliﬁcation
ﬁeld. Finally, Section IV concludes the report and outlines
some issues for future research.
II. SO FT WARE DI VE RS IFI CATI ON
During the traditional ﬂow of software development, all
instances (or clones) of a software are commonly deployed
using the same code and logic. Accordingly, an attack that
works successfully for one of its instances also works on all.
Thus, the robustness and security of the systems is affected be-
cause of the feasibility of replication of an attack in any of the
instances. Software diversiﬁcation provides a viable solution
to manage this issue, representing a natural evolution to the
traditional software construction and deployment paradigms.
Software diversiﬁcation consists in the modiﬁcation of soft-
ware in order to create many instances of the same software,
but with different implementations, while providing identical
functionality . Software diversiﬁcation aims to increase the
adaptability and robustness of the whole software system,
making it more secure and resistant to attacks or perturbations
in the environment.
When a software system is diversiﬁed, since the gener-
ated instances contain different code and logic, an attack on
one instance is not guaranteed to work on another instance.
This can slow the spread of attacks and mislead attackers,
increasing the security of the software system and preventing
them for exploiting massively. For instance, diverse computing
environments decrease the chances for a successful worm or
botnet attacks , as both types of attack rely on uniform
and compatible environments in order to replicate (worm) or
initiate (botnet). Diversiﬁcation can therefore be viewed as
an obfuscation technique for increasing the robustness and
adaptability of software systems. Its main goal is to promote
adaptive capacities in the face of unforeseen structural and
The ﬁrst experiments with software diversiﬁcation inves-
tigated its advantages for fault tolerance , . More
recently, source code randomization has gain attention for
cyber security , . Baudry et al.  extended the
concept of software diversity to a wide range of facets. For
instance, natural software diversiﬁcation is perceived as a form
of software diversity that could emerge spontaneously during
development and is more common in open source communities
. On the other hand, automated software diversiﬁcation
consists of techniques for artiﬁcially and automatically syn-
thesizing the diversity .
Many recent works have experimented with the integration
of multiple forms of diversity in the software systems in
order to obtain beneﬁts from several forms of protection. The
idea is to use biological evolutionary mechanisms with the
aim of facilitate the emergence of multiple forms of software
diversity in collaborative adaptive systems, through automatic
transformation and evolution . For example,  et al.
aim to break the application monoculture of web applications
by promoting multi-tier diversiﬁcation, combining natural and
automatic diversity, in a realistic web–based architectural set-
ting. In this scenario, the expected outcome is a set of software
evolution and maintenance methods that spontaneously sustain
diversity in collaborative adaptive systems.
One of the current problems of software diversiﬁcation
is determining better ways to explore the space of transfor-
mations. In addition, there is no clear consensus of how to
measure and evaluate the quality of the diversity for different
scenarios , . Test suites dispose diversiﬁcation inside
different regions of programs in very unequal ways (i.e.,
diversiﬁcation has different performances on a statement that
is covered by one hundred test case than on a statement that is
covered by a single test case ). Furthermore, diversiﬁcation
has a direct impact on distribution and maintenance. For
example, when the binary code of an application must be
signed by a third-party, the production of millions of diverse
variants becomes a challenge. Another example is dump trace
analysis or incremental updates. This will require accurate
traceability of variants and reversible code transformations,
as well as new forms of code analysis for automatic patching.
The application of machine learning methods represents
an attractive approach to handle some of these concerns in
software diversiﬁcation. For example, extending the space of
transformations through the search of more suitable variants
using some heuristic mechanism, clustering functionalities that
are best related each other in order to achieve a more speciﬁc
diversiﬁcation of the system, or analyzing the behavior of the
natural diversity in software repositories. In the following, we
address in more details some of these approaches.
A. Generation of synthetic diversity
Recent work focus on the automatic synthesis of software
instances in order to maximize the potential impact of diversi-
ﬁcation for system’s resilience. For example, Feldt  used
genetic programming to automatically synthesize variants of
an aircraft controller in order to achieve a better response to
failure diversity, Rinard et al. ,  developed unsound
program transformations that support the runtime production
of diversity and handle changes in quality of service, Forrest
et al.  have explored genetic programming for automatic
bug ﬁxing and neutral mutation .
Another interesting approach in the generation of synthetic
diversity is the work of Baudry et al. . They create “sosie”
programs, which are variants of software that exhibits the same
functionality, passing the same test suit but computationally
diverse in control statements or data ﬂow. The generation of
these sosies is based on the transformation of the original
program through statement deletion, addition or replacement
From the perspective of machine learning, software in-
stances are perceived as sources of information from which
to learn. In some manner, diversiﬁed software versions could
be handled as machine learning instances. This approach
broadens the vision of software as an entity from which useful
knowledge can be gained.
The problem resembles a representation task: source code is
unambiguous and highly structured. There have been several
efforts on the ﬁeld of code mining and code analysis, such
as code representation using abstract syntax trees (AST) ,
control ﬂow graph (CFG) , or even as XML format .
The purpose is to explain how the code instructions are
composing into a higher–level meaning, which results in useful
software engineering tools that help for code construction and
The generation of synthetic software versions using machine
learning methods has several advantages over other arbitrary
randomization approaches. These techniques aim to maximize
the potential impact of diversiﬁcation on the system’s re-
silience. This is a ﬁrst step towards the more general goal
of developing machine learning methods that learn through
the use of some code representation. Additionally, software
diversiﬁcation comprises a new modality of machine learning
mechanisms with different characteristics compared to images
and natural language processing. Models based on source code
analysis and software transformation fall into a new branch of
methods that have interesting parallels to traditional processing
images and natural language.
An interesting approach to match software diversiﬁcation
with machine learning could be the way of generating syn-
thetic instances. Many supervised machine learning appli-
cations present problems when learning from imbalanced
datasets. The SMOTE algorithm, proposed by Chawla et
al. , is a popular method of over–sampling by generat-
ing synthetic instances, avoiding overﬁtting of random over–
sampling. SMOTE generates synthetic instances by interpo-
lating between minority examples that lie together, making
the decision regions larger towards majority class and less
speciﬁc. Synthetic examples are introduced along the line
segment between each minority class example and one of its
kminority class nearest neighbors. Its generation procedure
for each minority class example consist in (1) choose one of
its kminority class nearest neighbors, (2) take the difference
between the two vectors, and (3) multiply the difference by a
random number between 0 and 1, and add it to this example.
We believe that the use of SMOTE’s similar techniques
could improve the quality of the synthetic software instances
during the diversiﬁcation process. Furthermore, a heuristically
guided search of the space of transformations could generate,
of even improve, these software instances (e.g., sosies that
passes a larger test case than the original version or that present
a better representation or structure). However, as could be
expected, the representation of software systems as instances
for machine leaning results in a very challenging task.
B. Diversiﬁcation in ensemble learning
In machining learning, the ensemble methodology consists
in measuring a set of individual patterns using multiple
learning algorithms and merge its results to obtain a better
predictive performance (e.g., decreasing the error rate or
improving accuracy). Ensemble learning can be seen as a
learning strategy that addresses inadequacies in training data.
Ultimately, an ensemble is less likely to misclassify than
just a single component function. This approach is typical of
supervised learning, in which fast algorithms such as decision
trees are commonly used for improving the performance of
the decision boundary. Similarly, ensemble techniques have
been used in unsupervised learning scenarios, for example,
in consensus clustering or in anomaly detection. Anywise,
classiﬁer ensembles have proven to signiﬁcantly improve the
accuracy of a single classiﬁer .
There are two approaches to ensemble construction. One
is to combine component functions that are homogeneous
(derived using the same learning algorithm and being deﬁned
in the same representation formalism, for example, an ensem-
ble of functions derived by decision tree methods). Another
approach is to combine component functions that are hetero-
geneous (derived by different learning algorithms and being
represented in different formalisms, for example, an ensemble
of functions derived by decision trees, instance–based learning,
bayesian learning, and neural networks). Two main issues
exist in ensemble learning: ensemble construction and clas-
siﬁcation combination. There are bagging, cross-validation,
boosting methods for constructing ensembles, weighted vote
and unweighted vote for combining classiﬁcations. The Ada
Boost algorithm is recognized as one of the best methods for
constructing ensembles of decision trees.
Both homogeneous and heterogeneous ensembles could
be perceived as an special case of software diversity. They
share the same functionality (classiﬁcation or prediction)
but are based on the use of different learning algorithms
(heterogeneous) or trained with different subsets of the data
Empirically, is proven that ensembles tend to yield better
results when there is a signiﬁcant diversity among the models
used . However, it is still not clear how diversity affects
classiﬁcation performance, especially on minority classes,
since diversity is one inﬂuential factor of ensemble. Because
of the diversity affects the classiﬁer ensembles’ generalization
ability, a reduction process of its instances must retain the
classiﬁer ensembles’ diversity. If each classiﬁer in an ensemble
produces a very similar performance, such a classiﬁer ensem-
ble may not improve its generalization ability . On the
other hand, if an instance is classiﬁed into a wrong category
by a classiﬁer of an ensemble, other classiﬁers within the same
ensemble may correct the wrong classiﬁcation by combining
the rest of the classiﬁer’s results.
Many diversity measures for ensemble learning have been
proposed . For example,  et al. present a new en-
semble subset evaluation method that integrates classiﬁer
diversity measures into a novel classiﬁer ensemble reduction
framework. While few papers look into comparisons between
different diversifying heuristics, it could be interesting to
establish similarities between ensemble learning measures of
diversity and the software diversiﬁcation paradigm. The goal
is to obtain more general criteria on the quality of the software
systems diversity and exploring the diversiﬁcation quality of
C. Exploration of the diversiﬁcation space
The different ways of transformation during software diver-
siﬁcation is unlimited. Accordingly, to explore all the space
of possible variants results in an unaffordable endeavor (this
is the main motivation for the application of diversiﬁcation on
security). Meta-heuristic search algorithms, such as evolution-
ary algorithms, could be used to perform a more permeating
diversiﬁcation , . A big question is how to identify
the software engineering principles and evolution rules that
drive the emergence and the constant renewal of diversity in
The mining of software repositories is a relatively novel
research ﬁeld that links software engineering to data mining.
Its goal is to analyze the rich data available in software
repositories to uncover interesting and actionable information
about software systems, projects and software engineering in
general. Some commonly explored areas comprehend software
evolution , models of software development processes
, characterization of developers’ behavior and their ac-
tivities , prediction of future software qualities , use
of machine learning techniques on software project data ,
software bug prediction , analysis of software change
patterns , and analysis of code clones .
Repository mining offers a vast set of tools for analyzing
natural software diversity in several software ecosystems,
across multiple projects and platforms. Now we can explore
the different facets of software diversity empirically, in a
bigger and massive way. This includes not only the analysis
of software as a product itself, but also the human interactions
among developers to understand the way their perceived and
conceived the software during the development and workﬂow
of a project.
The analysis of the natural diversity using the techniques
offered by software repository mining could serve as a baseline
for generating synthetic diversity. Furthermore, data in soft-
ware repositories represents a natural ﬁeld for the application
of machine learning methods, big data analysis and deep
learning to software engineering. Some interesting applications
include prediction of software defects using classiﬁcation and
regression, clustering of similar developing patterns and code
reuse, analysis of natural language artifacts and interactions
among developers, empirical studies on extracting knowledge
from large projects via association rules mining and visualiza-
tion techniques to summarize source code data, etc.
III. AUTOMATI C TES T AMP LIFICATIO N
Software testing is closely related to software quality. Sev-
eral testing methodologies have been implemented to verifying
the correctness of a software system and ensuring that a
program meets certain speciﬁcations. While software testing
is a signiﬁcant step during the development process, it is also
very expensive as it should take place throughout the whole
software development cycle. Various studies indicate that the
time and effort spent during software testing usually is greater
than the overall system implementation cost .
Automatic test generation is a traditional subject of software
testing. Its aim is to provide faster and cheaper testing by
generating more efﬁcient and accurate testing cases, without
requiring special skills or knowledge of the system’s behavior.
Automatic testing also loses the testing activities from cogni-
tive bias and could produce less errors during testing.
The increasing use and expansion of strong test suits, such
as JUnit for Java, has promoted a vast amount of manually
well–written test cases. In this context, test ampliﬁcation has
gained and special attention . This is a special variant of
automatic test generation in which pre–existing test cases are
used to assist the automated generation of additional test cases.
The objective of test ampliﬁcation techniques is improving the
value of existing test suits for achieving an speciﬁc engineering
goal (e.g, increasing test coverage , improve observability
, assess properties of the test suits  or detect faults
Despite the recent progress made in this ﬁeld, many
challenges still remain open. For instance, there are some
difﬁculties in how to make a better use of the information
contained in the existing tests in order to synthesizing new
ones. Furthermore, it is not clear how the changes on the
existing test statements affect the quality of the test suits,
which has several implications in terms of scalability of the
testing system. On the other hand, Danglot et al.  also
note the absence of comparison works between traditional
test generation (generating test cases from scratch) and test
ampliﬁcation (generating tests from existing tests).
In this context, machine learning methods are interesting
tools in the domain of testing ampliﬁcation. There have been
proposed several approaches that apply mutations to the ex-
isting tests for effectively generating new tests cases. Search–
based methods represent a more efﬁcient way for exploring the
testing input requirements, in order to tackling with the almost
inevitable updates of the software system. In this section, we
address some ideas about the application of machine learning
methods for improving test ampliﬁcation tasks.
A. Search–based test ampliﬁcation
Search–based test data generation is a form of dynamic
testing in which additional test data is synthesized following
some search heuristics. The idea of using existing test data
in order to generate additional test examples renders very
well to search–based software testing. Meta–heuristic search
algorithms have proven to achieve great success for performing
the analysis and expansion of the search space.
Genetic algorithms  is the most widely used strategy
to generate synthetic test cases that satisﬁed desired testing
requirements . For this particular purpose, the algorithm
does not search for a single optimal solution, instead, it auto-
matically searches the space for suitable testing cases while a
ﬁtness function that evaluates the requirements is continuously
updated. The issue of premature convergence to local optima
has been a common problem in genetic algorithms so far.
To overcome this problem, many improvements to the ﬁtness
function have been proposed.
Baudry et al.  presented the bacteriologic algorithm for
test case optimization. The algorithm applies several mutations
on an initial test suite and incrementally evolves an improved
test suite that is considered to be superior to the original one in
terms of a mutation score that they deﬁned previously. In this
manner, most meta–heuristic algorithms that have been used
for test data generation require one or more initial solutions
from which to start the search.
In the work of Yoo and Harman , they propose a
search–based test data regeneration algorithm based on the
hill climbing strategy, which adopts random restart in order
to avoid local optima. This test data regeneration technique
assumes that the existing test data belong to global optima,
and, therefore, always starts from a global optimum that
corresponds to the existing test data. Interestingly, they found
that the mutation faults detected by the generated test data are
different from those that are detected by the original test suite.
There is a vast amount of additional meta–heuristic and
optimization techniques that could be used for reﬁning the
search space in accordance to some coverage criteria 
(e.g., ant colony optimization, ﬁreﬂy algorithm, particle swarm
optimization, simulate annealing, artiﬁcial bee colony algo-
rithm). As in ensemble learning, a hybrid approach using these
techniques may offer a best overall result of the search space.
In many real testing scenarios a single-objective optimiza-
tion approach is unrealistic . Developers usually want to
ﬁnd test sets that meet several objectives simultaneously in
order to maximize the value obtained from the inherently
expensive process of running the test cases and examining
the output they produce. Several multi-objective evolutionary
algorithms have been applied to the test data generation
problem –. However, as far as we known, there has
been no work on multi-objective test ampliﬁcation reported in
On the other hand, as testing can only detect the existence of
faults, and not the lack of them, executing additional test cases
can only increase the conﬁdence in the program under test.
Furthermore, it may be possible to utilize test data regeneration
not only for creating more tests cases, but also to improve
existing test suites. Summarizing, meta–heuristic techniques
are good since they reward individuals with high score but
they do not favor diversity and the search may converge to
many local optima.
To avoid this drawback, Boussaa et al.  proposed a new
search–based approach for test data generation with the goal of
achieving more diversity in the testing space. The idea consists
in an adaptation of the Novelty Search Algorithm . They
deﬁne a new measure of novelty based on distances in order to
maximize a ﬁtness function that evaluates generated test cases.
Thereby, individuals in this evolving population are selected
based on how different they are compared to other solutions
evaluated so far.
The patterns that conform the existing testing libraries
represent a structured way of knowledge. Consequently, it
might be feasible to incorporate machine learning techniques
into the whole ﬂow of software development and maintenance
processes. For instance, unit tests could allow us to learn
from a huge set of predeﬁned testing examples. Furthermore,
it could be interesting to use hybridized methods, guided
by multi-objective optimization criteria of diversity, to gen-
erate improved and ampliﬁed testing suits. The application of
machine learning for test design and pattern detection is a
promising area still under research .
This report presented some ideas for integrating machine
learning principles to software engineering. In particular, we
discuss some applications to the ﬁelds of both software diver-
siﬁcation and automatic test ampliﬁcation. We review various
important areas such as the generation of synthetic versions
of software and the use of ensemble learning approaches, the
exploration of source code and repository mining, and the
automatic ampliﬁcation and reﬁnement of test cases. To sum
up, we identify the following interesting lines:
•Represent source code structures, or even entire pro-
grams, as instances for performing machine learning
•Study the diversity measures proposed for ensemble
learning and its application to assess software diversity.
•Analyze natural software diversity using repository min-
•Investigate the advantages of machine learning methods,
in conjunction with novel search–based approaches, for
automatic test data ampliﬁcation.
This report serve as a starting point to the author in order to
strengthen his understanding on these issues with the purpose
of identifying novel and promising future research directions.
 S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning:
From Theory to Algorithms. Cambridge University Press, 2014.
 J. Han and M. Kamber, Data Mining Concepts and Techniques, M. Kauf-
mann, Ed. Morgan Kaufmann Publishers, 2006.
 I. H. Witten, E. Frank, and M. A. Hall, Data Mining Practical Machine
Learning Tools and Techniques, 3rd ed. Morgan Kaufmann Publishers,
 D. Zhang, Advances in machine learning applications in software
engineering. IGI Global, 2006.
 T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic
literature review on fault prediction performance in software engineer-
ing,” IEEE Transactions on Software Engineering, vol. 38, no. 6, pp.
 K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens, “Data mining
techniques for software effort estimation: a comparative study,” IEEE
transactions on software engineering, vol. 38, no. 2, pp. 375–397, 2012.
 L. An and F. Khomh, “An empirical study of crash-inducing commits in
mozilla ﬁrefox,” in Proceedings of the 11th international conference on
predictive models and data analytics in software engineering. ACM,
2015, p. 5.
 L. L. Minku and X. Yao, “How to make best use of cross-company data
in software effort estimation?” in Proceedings of the 36th International
Conference on Software Engineering. ACM, 2014, pp. 446–456.
 B. Baudry and M. Monperrus, “The multiple facets of software diversity:
Recent developments in year 2000 and beyond,” ACM Computing
Surveys (CSUR), vol. 48, no. 1, p. 16, 2015.
 B. Danglot, O. Vera-Perez, Z. Yu, M. Monperrus, and B. Baudry,
“The emerging ﬁeld of test ampliﬁcation: A survey,” arXiv preprint
 Y. Yang, S. Zhu, and G. Cao, “Improving sensor network immunity
under worm attacks : A software diversity approach,” Ad Hoc Networks,
vol. 0, pp. 1–15, 2016.
 L. Chen and A. Avizienis, “N-version programming: A fault-tolerance
approach to reliability of software operation,” in Proc. of the Int. Symp.
on Fault-Tolerant Computing (FTCS78), 1978, pp. 3–9.
 B. Randell, “System structure for software fault tolerance,” IEEE Trans-
actions on Software Engineering, no. 2, pp. 220–232, 1975.
 Z. Lin, R. Riley, and D. Xu, “Polymorphing software by randomizing
data structure layout.” in DIMVA, vol. 9. Springer, 2009, pp. 107–126.
 G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-
injection attacks with instruction-set randomization,” in Proc. of the conf.
on Computer and communications security (CCS), no. 272–280, 2003.
 D. Mendez, B. Baudry, and M. Monperrus, “Empirical evidence of
large-scale diversity in api usage of object-oriented software,” in Source
Code Analysis and Manipulation (SCAM), 2013 IEEE 13th International
Working Conference on. IEEE, 2013, pp. 43–52.
 J. E. Just and M. Cornwell, “Review and analysis of synthetic diversity
for breaking mono–cultures,” in Proceedings of the 2004 ACM workshop
on Rapid malcode (WORM 04), ACM, Ed., 2004, pp. 23–32.
 B. Baudry, M. Monperrus, C. Mony, F. Chauvel, F. Fleurey, and
S. Clarke, “Diversify: Ecology-inspired software evolution for diver-
sity emergence,” in Software Maintenance, Reengineering and Reverse
Engineering (CSMR-WCRE), 2014 Software Evolution Week-IEEE Con-
ference on. IEEE, 2014, pp. 395–398.
 S. Allier, O. Barais, B. Baudry, J. Bourcier, F. Fleurey, M. Monperrus,
H. Song, and M. Tricoire, “Multi-tier diversiﬁcation in web-based soft-
ware applications,” IEEE Software, Institute of Electrical and Electronics
Engineers, vol. 32, no. 1, pp. 83–90, 2015.
 D. Partridge and W. Krzanowskib, “Software diversity : practical statis-
tics for its measurement,” 1997.
 D. Posnett, R. DSouza, P. Devanbu, and V. Filkov, “Dual ecological
measures of focus in software development,” in 35th International
Conference on Software Engineering (ICSE), 2013, pp. 452–461.
 B. Baudry, S. Allier, M. Rodriguez-Cancio, and M. Monperrus, “Au-
tomatic software diversity in the light of test suites,” arXiv preprint
 R. Feldt., “Generating diverse software versions with genetic program-
ming: an experimental study,” in IEE Proceedings-Software, vol. 145,
no. 6, 1998, pp. 228–236.
 M. Rinard, “Obtaining and reasoning about good enough software,” in
Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE.
IEEE, 2012, pp. 930–935.
 S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard,
“Managing performance vs. accuracy trade-offs with loop perforation,”
in Proceedings of the 19th ACM SIGSOFT symposium and the 13th
European conference on Foundations of software engineering. ACM,
2011, pp. 124–134.
 C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “Genprog: A
generic method for automatic software repair,” IEEE Transactions on
Software Engineering, vol. 38, no. 1, pp. 54–72, 2012.
 E. Schulte, Z. P. Fry, E. Fast, W. Weimer, and S. Forrest, “Software
mutational robustness,” arXiv preprint arXiv:1204.4224, 2012.
 B. Baudry, S. Allier, and M. Monperrus, “Tailored source code trans-
formations to synthesize computationally diverse program variants,” in
Proceedings of the 2014 International Symposium on Software Testing
and Analysis. ACM, 2014, pp. 149–159.
 M. Martinez, L. Duchien, and M. Monperrus, “Automatically extracting
instances of code change patterns with ast analysis,” IEEE International
Conference on Software Maintenance, pp. 388–391, 2013.
 O. Sahin and B. Akay, “Comparisons of metaheuristic algorithms
and ﬁtness functions on software test data generation,” Applied Soft
Computing, vol. 49, pp. 1202–1214, 2016.
 M. L. Collard and J. I. Maletic, “srcml 1.0: Explore, analyze, and
manipulate source code.” in ICSME, 2016, p. 649.
 N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:
Synthetic minority over-sampling technique,” Journal of Artiﬁcial Intel-
ligence Research, pp. 341–378, 2002.
 G. Yao, H. Zeng, F. Chao, C. Su, C.-M. Lin, and C. Zhou, “Integration of
classiﬁer diversity measures for feature selection-based classiﬁer ensem-
ble reduction,” Soft Computing-A Fusion of Foundations, Methodologies
and Applications, vol. 20, no. 8, pp. 2995–3005, 2016.
 L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classiﬁer
ensembles and their relationship with the ensemble accuracy,” Machine
learning, vol. 51, no. 2, pp. 181–207, 2003.
 B. Sun, J. Wang, H. Chen, and Y.-t. Wang, “Diversity measures in
ensemble learning,” Control and Decis, vol. 29, no. 3, pp. 385–395,
 K. Yeboah-Antwi and B. Baudry, “Embedding adaptivity in software
systems using the ecselr framework,” in Proceedings of the Companion
Publication of the 2015 Annual Conference on Genetic and Evolutionary
Computation. ACM, 2015, pp. 839–844.
 C. Zhu, Y. Li, J. Rubin, and M. Chechik, “A dataset for dynamic
discovery of semantic changes in version controlled software histories,”
in Proceedings of the 14th International Conference on Mining Software
Repositories. IEEE Press, 2017, pp. 523–526.
 G. Robles, J. M. Gonz´
alez-Barahona, C. Cervig´
on, A. Capiluppi, and
azar, “Estimating development effort in free/open
source software projects by mining software repositories: a case study of
openstack,” in Proceedings of the 11th Working Conference on Mining
Software Repositories. ACM, 2014, pp. 222–231.
 M. M ¨
a, B. Adams, G. Destefanis, D. Graziotin, and M. Ortu,
“Mining valence, arousal, and dominance: possibilities for detecting
burnout and productivity?” in Proceedings of the 13th International
Conference on Mining Software Repositories. ACM, 2016, pp. 247–
 P. Thongtanunam, R. G. Kula, A. Erika, and C. Cruz, “Improving code
review effectiveness through reviewer recommendations,” pp. 1–4, 2014.
 L. Marks, Y. Zou, and A. E. Hassan, “Studying the ﬁx-time for bugs
in large open source projects,” in Proceedings of the 7th International
Conference on Predictive Models in Software Engineering. ACM, 2011,
 R. Coelho, L. Almeida, G. Gousios, A. Van Deursen, and C. Treude,
“Exception handling bug hazards in android-results from a mining study
and an exploratory survey.” Empirical Software Engineering, vol. 22,
no. 3, pp. 1264–1304, 2017.
 M. Soto, F. Thung, C.-P. Wong, C. Le Goues, and D. Lo, “A deeper
look into bug ﬁxes: Patterns, replacements, deletions, and additions,” in
Proceedings of the 13th International Conference on Mining Software
Repositories. ACM, 2016, pp. 512–515.
 D. Steidl and N. G¨
ode, “Feature-based detection of bugs in clones,”
in Proceedings of the 7th International Workshop on Software Clones.
IEEE Press, 2013, pp. 76–82.
 P. Ammann and J. Offutt, Introduction to software testing, 2008.
 J. Zhang, Y. Lou, L. Zhang, D. Hao, L. Zhang, and H. Mei, “Isomorphic
regression testing: executing uncovered branches without test augmen-
tation,” in Proceedings of the 2016 24th ACM SIGSOFT International
Symposium on Foundations of Software Engineering. ACM, 2016, pp.
 M. Patrick and Y. Jia, “Kd-art: Should we intensify or diversify tests to
kill mutants?” Information and Software Technology, vol. 81, pp. 36–51,
 B. H. Smith and L. Williams, “On guiding the augmentation of an auto-
mated test suite via mutation analysis,” Empirical Software Engineering,
vol. 14, no. 3, pp. 341–379, 2009.
 B. Baudry, F. Fleurey, J.-M. J´
equel, and Y. Le Traon, “From genetic to
bacteriological algorithms for mutation-based testing,” Software Testing,
Veriﬁcation and Reliability, vol. 15, no. 2, pp. 73–96, 2005.
 D. E. Goldberg, “Genetic algorithms in search, optimization, and ma-
chine learning,” Reading: Addison-Wesley, 1989.
 R. P. Pargas, M. J. Harrold, and R. R. Peck, “Test-data generation using
genetic algorithms,” Software Testing Veriﬁcation and Reliability, vol. 9,
no. 4, pp. 263–282, 1999.
 B. Baudry, F. Fleurey, J.-M. J´
equel, and Y. Le Traon, “Automatic test
case optimization: A bacteriologic algorithm,” ieee Software, vol. 22,
no. 2, pp. 76–82, 2005.
 S. Yoo and M. Harman, “Test data regeneration: generating new test data
from existing test data,” Software Testing, Veriﬁcation and Reliability,
vol. 22, no. 3, pp. 171–201, 2012.
 K. Lakhotia, M. Harman, and P. McMinn, “A multi-objective approach
to search-based test data generation,” in Proceedings of the 9th annual
conference on Genetic and evolutionary computation. ACM, 2007, pp.
 R. A. Matnei Filho and S. R. Vergilio, “A mutation and multi-objective
test data generation approach for feature testing of software product
lines,” in Software Engineering (SBES), 2015 29th Brazilian Symposium
on. IEEE, 2015, pp. 21–30.
 J. Ferrer, F. Chicano, and E. Alba, “Evolutionary algorithms for the
multi-objective test data generation problem,” Software: Practice and
Experience, vol. 42, no. 11, pp. 1331–1362, 2012.
 R. A. Matnei Filho and S. R. Vergilio, “A multi-objective test data
generation approach for mutation testing of feature models,” Journal
of Software Engineering Research and Development, vol. 4, no. 1, p. 4,
 M. Boussaa, O. Barais, G. Suny´
e, and B. Baudry, “A novelty search
approach for automatic test data generation,” in Proceedings of the
Eighth International Workshop on Search-Based Software Testing. IEEE
Press, 2015, pp. 40–43.
 J. Lehman and K. O. Stanley, “Exploiting open-endedness to solve
problems through the search for novelty,” in ALIFE, 2008, pp. 329–336.
 M. Zanoni, F. A. Fontana, and F. Stella, “On applying machine learn-
ing techniques for design pattern detection,” Journal of Systems and
Software, vol. 103, pp. 102–117, 2015.