ChapterPDF Available

Testing of Future Internet Applications Running in the Cloud

Authors:
  • Expleo Germany GmbH

Abstract

The cloud will be populated by Future Internet (FI) software, comprising advanced, dynamic and largely autonomic interactions among services, end-user applications, content and media. The complexity of the technologies involved in the cloud makes testing extremely challenging and demands novel approaches and major advancements in the field. This chapter describes the main challenges associated with the test of FI applications running in the cloud. We present a research agenda that was defined in order to address the FI testing challenges. The goal of the agenda is investigating the technologies for the development of a FI automated testing environment, which can monitor the FI applications under test and can react dynamically to the observed changes. Realization of this environment involves substantial research in areas such as search based testing, model inference, oracle learning and anomaly detection.
Testing of Future Internet Applications
Running in the Cloud
Tanja Vos
Universidad Politécnica de Valencia, Spain
Paolo Tonella
Fondazione Bruno Kessler, Trento, Italy
Joachim Wegener
Berner & Mattner
Mark Harman
University College London
Wishnu Prasetya
Univeristy of Utrecht
Yarden Nir-Buchbinder
IBM Israel
Shmuel Ur
Bristol University
ABSTRACT The cloud will be populated by Future Internet (FI) software, comprising advanced,
dynamic and largely autonomic interactions among services, end-user applications, content and media.
The complexity of the technologies involved in the cloud makes testing extremely challenging and
demands novel approaches and major advancements in the field.
This chapter describes the main challenges associated with the test of FI applications running in the cloud.
We present a research agenda that was defined in order to address the FI testing challenges. The goal of
the agenda is investigating the technologies for the development of a FI automated testing environment,
which can monitor the FI applications under test and can react dynamically to the observed changes.
Realization of this environment involves substantial research in areas such as search based testing, model
inference, oracle learning and anomaly detection.
INTRODUCTION
The FI will be a complex interconnection of services, applications, content and media running in the
cloud. It will offer a rich user experience, extending and improving current hyperlink-based navigation.
Key technologies contributing to the development of FI services and applications include a rich, complex,
dynamic and stateful client. This client interacts asynchronously with the server, where applications are
organized as services and run in the cloud, taking advantage of dynamic service discovery, replacement
and composition. Adaptivity and autonomy improve the user experience, by dynamically changing both
the client and the server side, through capabilities such as self-configuration and self-healing. As a
consequence, FI applications will exhibit emergent behavior which makes them hard to predict.
Our society will become increasingly dependent on services built on top of this complex and emerging
Future Internet. Critical activities such as public utilities, social services, government, learning, finance,
business, but also entertainment will depend on the underlying software and services. As a consequence,
the applications running on top of the Future Internet will have to meet high quality and dependability
demands. Not only the functional quality aspect is important, but non-functional aspects like performance,
security, and privacy will become increasingly more important. All these make verification and validation
for quality assurance of FI applications extremely important. In this chapter, we discuss how to address
the FI testing challenges, by describing the features of an integrated environment for continuous
evolutionary automated testing, which can monitor the FI application under test and adapt to the dynamic
2
changes observed. FI testing will require continuous post-release testing since the application under test
does not remain fixed after its release. Services and components could be dynamically added by
customers and the intended use could change significantly. Therefore, testing is done continuously after
deployment to the customer, either in vitro or in vivo. The testing environment we describe integrates,
adapts and automates techniques for continuous FI testing (e.g. dynamic model inference, log-based
diagnosis, oracle learning, classification trees and combinatorial testing, concurrent testing, regression
testing). To make it possible for the above mentioned techniques to deal with the huge search space
associated with FI testing, evolutionary search based testing will be used. Search-based algorithms will be
used to guide the solution of identified problems so as to optimize properly defined objective functions. In
this way, we can address the ultimate challenge of FI applications: testing unexpected behavior that may
originate from the dynamism, autonomy and self-adaptation involved.
BACKGROUND
FI testing demands for major advancements in several areas of software testing. We discuss the state of
the art in each of these areas separately, in the following.
Beyond the state of the art of search based techniques
The current state of the art in search based techniques is formed by a wide interest (Harman, 2007). The
area of testing is the most prominent software engineering domain for the application of search
techniques. Search based testing techniques have been applied to various real world complex systems
(e.g., embedded systems) (Vos et al., 2010; Baars et al., 2010) to deal with automated test case generation
for structural (white-box) as well as functional (black-box) testing. While these testing targets remain
relevant for FI applications as well, the continuous, autonomous testing framework that we envision
introduces new opportunities for search based exploration of the solution space. Correspondingly, novel
fitness function definitions and search algorithms will be required.
Innovative approaches to genetic programming applied to testing may also contribute to FI testing. So far,
genetic programming has received limited attention in testing. It has been successfully used to conduct
unit testing of object oriented code, by providing a simple and effective mechanism to bring the object
under test to a proper internal state (Tonella, 2004). We think genetic programming can be pushed beyond
such simple applications, by considering it as a powerful technique to co-evolve the testing engine
together with the self-modifying, adaptive FI application under test.
Beyond the state of the art of Web testing
The vast, existing literature (Ricca & Tonella, 2001; Elbaum et al., 2005; Sampath et al., 2007) on Web
testing is focused on client-server applications which implement a strictly serialized model of interaction,
based on form submissionserver response sequences. Testing of Ajax and rich client Web applications
has been considered only recently (Mesbah & van Deursen, 2009; Marchetto et al., 2008). To address FI
testing, such results should be extended in the direction of increasing the level of automation and of
supporting continuous, unattended test generation and execution. These extensions require research in the
area of automated model and invariant inference, anomaly detection and input data generation. If these
involve monitoring or logging production runs, the introduced overhead may be significant; so ultimately
the issue of performance also needs to be addressed.
Beyond the state of the art of model inference
FI testing requires major advancements of the state of the art for model inference. Existing techniques
(Lorenzoli et al., 2008; Dallmeier et al., 2006) rely either on algorithms for regular language learning or
on predefined, hardcoded abstraction functions. These techniques produce a state model which is hardly
interpretable by humans. In fact, while event sequences are meaningful, states do not necessarily
correspond to an internal state of the FI application. Since we want to provide meaningful feedback to
3
testers, this approach is not attractive. The other option is hardly viable in the dynamic, adaptive context
of the FI. No abstraction defined in advance can be adequate for the dynamic nature of the FI.
Beyond the state of the art of log based diagnosis and oracle learning
There has been substantial research in the use of logs for testing (Andrews & Zhang, 2000; Rozinat & van
der Aalst, 2007). This research focuses on observing errors from logs. FI testing needs to extend this with
the capability to infer oracles, likely oracles, and atypical executions from logs. State of the art tools for
log based diagnosis (Lorenzoli et al., 2008; Hangal & Lam, 2002) rely on invariant inference algorithms
such as the one implemented in Daikon (Ernst et al., 2007). These algorithms should be extended to
support richer temporal logics, used to express the inferred invariants. Temporal invariants are more
appropriate than Daikon’s static invariants for the finite state models, since the correctness of their
behavior is strictly dependent on the sequence of states traversed over time. In one hand this demands
increased information in the logs. On the other hand, in practice it happens that an unauthorized 3rd party
manages to acquire logs. Therefore the issue of privacy must also be addressed.
Beyond the state of the art of coverage testing
FI applications are (ultra-)large systems. For such a system current coverage criteria are not meaningful
and require substantial extensions (Adler et al., 2009). One difficulty is that while the application specific
code is expected to be fully utilized, not all the capabilities of the off-the-shelf components are used, or
can be used, in the application. Consequently, a coverage measure of for example 50%, does not make
clear whether we have performed a really good test that covered all the relevant functionality or there is
still is a lot of relevant functionality left to be tested. In order for coverage to be useful we need a
methodology for severely limiting the false reporting, and for filtering out the code which is not relevant
to the solution.
Beyond the state of the art of concurrency testing
State of the art techniques for concurrency testing are focused on single applications that exhibit some
degree of internal parallelism and distribution (ena et al., 2009). We need solutions to the FI context,
where applications are not self-contained and act more as service composers and integrators than as
simple functionality providers. Such integration is characterized by a class of concurrency which is harder
to treat and control than in traditional concurrency testing. Substantial enhancements are required to
approaches based on explicit, direct control of timings and scheduling, since the integration itself is
comprised of multiple concurrently executing parts and therefore must be augmented with capabilities of
altering the ongoing communications as well as the distribution over multiple machines of the different
components. Novel mechanisms for debugging and novel record-replay functionalities are also needed to
support testers trying to identify the causes of bugs. Artificial load creation is another way to exercise an
FI integrator application in a way that increases the chances of revealing faults.
Beyond the state of the art of combinatorial testing
Combinatorial testing using classification trees is a test case generation technique widely used in industry
(Kruse & Luniak, 2010). For FI technologies, the number of classes in a classification tree could become
very large, resulting in a tremendous amount of possible test cases. To address FI testing, more research is
needed to increase the level of automation and support for continuous, dynamic generation of
classification trees and fault-sensitive test case generation. These extensions require research in the area
of new combinatorial techniques for test case generation, e.g. the inclusion of statistical information like
operational profiles of internet applications to generate a representative set of test cases, the application of
evolutionary search techniques to search for the optimal test suites, and the combination of classification
trees with oracle learning to automate expected values prediction.
Beyond the state of the art of empirical evaluation
4
In order to assess the testing techniques developed, evaluative research must involve realistic FI systems
and realistic subjects. It should be done with thoroughness to ensure that any benefits identified during the
evaluation study are clearly derived from the testing technique studied. It should be done in such a way
that different studies can be compared. This type of research is time-consuming, expensive and difficult.
However, they are fundamental since claims made by analytical advocacy are insupportable (Fenton et
al., 1994). What is needed (Hesari et al., 2010) is a general methodology evaluation framework that will
simplify the evaluation procedure and make the results more accurate and reliable.
CHALLENGES
FI applications will be characterized by an extremely high level of dynamism. Most decisions, normally
made at design time, are deferred to execution time, when the application can take advantage of
monitoring (self-observation, as well as data collection from the environment and logging of the
interactions) to adapt itself to a changed usage context. The realization of this vision involves a number of
technologies, including: observational reflection and monitoring; dynamic discovery and composition of
services; hot component loading and update; structural reflection, to support self-adaptation and
modification; asynchronous communication; high configurability and context awareness; composability
into large-scale systems of systems.
While offering major improvements over the currently available Web experience, such features pose
several challenges to testing, summarized in Table 1.
CH1
Self modification
Rich clients have increased capability to dynamically adapt the structure of the
Web pages; server-side services are replaced and recomposed dynamically
based on Service Level Agreements (SLA), taking advantage of services
newly discovered in the cloud; components are dynamically loaded.
CH2
Autonomic
behavior
FI applications are highly autonomous; their correct behavior cannot be
specified precisely at design-time.
CH3
Low observability
FI applications are composed of an increasing number of 3rd-party
components and services running in the cloud, accessed as a black box, which
are hard to test.
CH4
Asynchronous
interactions
FI applications are highly asynchronous and hence hard to test. Each client
submits multiple requests asynchronously; multiple clients run in parallel;
server-side computation is distributed over the cloud and concurrent.
CH5
Time and load
dependent behavior
For FI applications, factors like timing and load conditions make it hard to
reproduce errors during debugging.
CH6
Huge feature
configuration space
FI applications are highly customizable and self-configuring, and contain a
huge number of configurable features, such as user-, context-, and
environment-dependent parameters.
CH7
Ultra-large scale
FI applications are often systems of systems running in the cloud; traditional
testing adequacy criteria cannot be applied, since even in good testing
situations low coverage will be achieved.
Table 1. Main testing challenges for FI applications
RESEARCH AGENDA
The testing challenges enumerated in Table 1 can be addressed by developing tools and techniques for
continuous evolutionary automated testing, which can monitor the FI application and adapt themselves to
5
the dynamic changes observed. FI testing will be continuous post-release testing since the application
under test does not remain fixed after its release. Services and components could be dynamically added
by customers and the intended use could change. Therefore, testing has to be performed continuously
after deployment to the customer.
The underlying technology we devise for FI testing, that will make it possible for the above mentioned
techniques to cope with the FI testing challenges like dynamism, self-adaptation and partial observability,
will be based on evolutionary search based testing. The impossibility of anticipating all possible
behaviors of FI applications suggests a prominent role for evolutionary testing techniques, because this
relies on very few assumptions about the underlying problem it is attempting to solve. In addition,
stochastic optimization and search techniques are adaptive and, therefore, able to modify their behavior
when faced with new unforeseen situations. These two properties their freedom from limiting
assumptions and their inherent adaptiveness make evolutionary testing approaches ideal for handling FI
applications testing, with their dynamic, self-adapting, autonomous and unpredictable behavior.
To achieve this overall aim, a number of research objectives that directly map to the identified challenges
should be investigated in the future. A summary of such research objectives is in Table 2. This book
chapter describes the approaches and techniques that can be followed to conduct research and achieve
practical results in each of these areas.
OBJ1
Evolutionary,
search based
testing approach
OBJ2
Continuous,
automated
testing approach
OBJ3
Dynamic model
inference
OBJ4
Model based
test case
derivation
OBJ5
Log-based
diagnosis and
oracle learning
OBJ6
Dynamic
classification
tree generation
OBJ7
Test for
concurrency
bugs
OBJ8
Testing the
unexpected
6
OBJ9
Coverage and
regression
testing
OBJ10
General
evaluation
framework for
FI testing
Table 2. Research objectives in FI testing
CONCLUSION
The challenges involved in testing FI applications running in the cloud can be addressed by resorting to a
combination of advanced testing technologies (i.e. dynamic model inference, model-based test case
derivation, combinatorial testing, concurrent testing, regression testing, etc.) adapted to ensure a level of
automation that enables testing in a continuous mode. As a consequence, the testing time will increase
without any negative impact on the human resources involved, hence improving substantially the
affordability of the proposed FI testing techniques. Moreover, extended testing time results in improved
reliability and helps to cope with adaptive and dynamic changes of the FI software, which can be
observed only when the software is operated for long periods of time. The observed behaviors of FI
applications are automatically classified as normal or anomalous, so as to automatically and
autonomously trigger additional testing when needed.
The oracle component of the FI testing environment is key to ensure high quality and reliability of the
software running in the cloud. Hence, methods that support automated learning of oracles from the
observations have a big potential to reduce the cost of testing in the cloud scenario.
The flexibility of the search based approach to testing with respect to the testing goal makes it a suitable
unifying technology to address the FI testing challenges. The fitness function that guides test case
generation can be adapted to the features of FI applications and can track their continuous evolution and
autonomous modification over time. Genetic programming is an appealing option to allow the creation of
test programs that co-evolve with the FI application under test.
REFERENCES
Adler, Y., Farchi, E., Klausner, M., Pelleg, D., Raz, O., Shochat, M., Ur, S., & Zlotnick, A.: (2009)
Advanced code coverage analysis using substring holes, In Proceedings of the eighteenth international
symposium on Software testing and analysis (ISSTA '09). ACM, New York, NY, USA, 37-46.
Andrews J. H., & Zhang, Y (2000). Broad-spectrum studies of log file analysis. In Proceedings of the
International Conference on Software Engineering (pp. 105-114).
Baars, A., Vos, T., & Dimitrov, D. (2010) Using Evolutionary Testing to Find Test Scenarios for Hard to
Reproduce Faults, Software Testing Verification and Validation Workshop, IEEE International
Conference on, pp. 173-181, 2010 Third International Conference on Software Testing, Verification, and
Validation Workshops, 2010
Dallmeier, V., Lindig, C., Wasylkowski, A., Zeller, A. (2006). Mining Object Behavior with ADABU, In
Proceedings of the ICSE Workshop on Dynamic Analysis.
Edelstein, O. Farchi, E., Goldin, E., Nir, Y., Ratsaby, G., & Ur, S.: (2003) Framework for testing multi-
threaded Java programs. Concurrency and Computation: Practice & Experience 15(3-5): 485-499.
7
Elbaum, S. G., Rothermel, G., Karre, S., & Fisher II, M. (2005). Leveraging User-Session Data to
Support Web Application Testing. IEEE Transactions on Software Engineering, 31(3), 187-202.
Ernst, M., Perkins, J., Guo, P., McCamant, S., Pacheco, C., Tschantz, M. & Xiao, C. (2007). The Daikon
system for dynamic detection of likely invariants. Science of Computer Programing, 69 (1-3), 35-45.
Fenton, N., Pfleeger, S.L. & Glass, R.L., (1994) "Science and Substance: A Challenge to Software
Engineers," IEEE Software, pp. 86-95, July/August, 1994.
Hangal S., & Lam, M. S. (2002). Tracking down software bugs using automatic anomaly detection. In
Proceedings of the International Conference on Software Engineering.
Harman, M. (2007). The Current State and Future of SBSE. In Proceedings of Foundations of Software
Engineering (pp. 342-357).
Hesari, S., Mashayekhi, H., & Ramsin, R. (2010): Towards a General Framework for Evaluating Software
Development Methodologies. COMPSAC 2010: 208-217
Kȓena, B., Letko, Z., Nir-Buchbinder, Y., Tzoref-Brill, R., Ur, S., & Vojnar, T., (2009). A concurrency
testing tool and its plug-ins for dynamic analysis and runtime healing. In Runtime Verification. Lecture
Notes in Computer Science, 2009, Volume 5779/2009, 101-114
Kruse, P., & Luniak, M., (2010) Automated Test Case Generation Using Classification Trees. In
proceedings of STAREAST- Software Testing Conference, Analysis & Review.
Lorenzoli, D., Mariani, L., & Pezzè, M. (2008). Automatic generation of software behavioral models. In
Proceedings of the International Conference on Software Testing (pp. 501-510).
Marchetto, A., Tonella, P., & Ricca F. (2008). State-Based Testing of Ajax Web Applications. In
Proceedings of the International Conference on Software Testing (pp. 121-130).
Mesbah, A., & van Deursen, A. (2009). Invariant-based automatic testing of AJAX user interfaces. In
Proceedings of the International Conference on Software Engineering (pp. 210-220).
Rozinat A., & van der Aalst, W. M. P. (2007). Conformance checking of processes based on monitoring
real behavior. Information Systems, 33(1), 64-95.
Ricca, F., & Tonella, P. (2001). Analysis and Testing of Web Applications. In Proceedings of the
International Conference on Software Engineering (pp. 25-34).
Sampath, S., Sprenkle, S., Gibson, E., Pollock, L. L., & Greenwald, A. S. (2007). Applying Concept
Analysis to User-Session-Based Testing of Web Applications. IEEE Transactions on Software
Engineering, 33(10), 643-658.
Tonella, P. (2004). Evolutionary Testing of Classes. In Proceedings of the International Symposium on
Software Testing and Analysis (pp. 119-128).
Vos, T., Baars, A., Lindlar, F., Kruse, P., Windisch, A., & Wegener, J., (2010). Industrial Scaled
Automated Structural Testing with the Evolutionary Testing Tool. In Proceedings of the 2010 Third
International Conference on Software Testing, Verification and Validation (ICST '10). IEEE Computer
Society, Washington, DC, USA, 175-184.
... However, more feedback from practical applications of combinatorial testing is still needed [4]. Consequently, in the context of the FITTEST project [21] we have initiated work on creating more evidence about the application of combinatorial testing in real settings. In [15] a study can be found that presents a case done at a company called Sulake 4 to evaluate the effectiveness and efficiency of using model-based combinatorial testing, supported by the above mentioned CTE, within their industrial setting of testing Habbo hotel 5 . ...
... As was mentioned in the introduction, 3 of these studies were related to combinatorial testing (the current paper, [15] and [8]). The other 4 case studies using the framework can be found here: [19], [18], [12], [14] and evaluated different automated test case design tools that have been evaluated during the EvoTest [17] and FITTEST [21] project. ...
Conference Paper
Full-text available
This paper describes a case study executed to evaluate a combinatorial test design approach within the industrial setting of IBM Research. An existing combinatorial test suite was compared against a prioritized combinatorial test suite that was generated with the Combinatorial Tree Editor XL Profesional (CTE). The prioritization technique was recently developed and added to the CTE in the context of the FITTEST project. Test design for the new test suite was carried out by the developers of the prioritization technique. Test implementation and execution was done by the industrial partner of the System Under Test. This case study has investigated whether the prioritized combinatorial technique is useful to complement current testing practices at IBM Research. The focus of this study is on fault fnding capability of artifcially injected faults that have been selected and prioritized using domain knowledge and expertise, and efciency of test case execution. Conclusions of this study are that for the testing of the target product in a simulated environment, the improved combinatorial testing tools do qualify as useful and this type of testing will be included in current practices.
... More feedback from practical applications of combinatorial testing is still needed [5]. Consequently, in the context of the FITTEST project [6] we have initiated work on creating more evidence about the application of combinatorial testing in real settings. ...
Conference Paper
Numerous combinatorial testing tools are available for generating test cases. However, many of them are never used in practice. One of the reasons is the lack of empirical studies that involve human subjects applying testing techniques. This paper aims to investigate the applicability of a combinatorial testing tool in the company SOFTEAM. A case study is designed and conducted within the development team responsible for a new product. The participants consist of 3 practitioners from the company. The applicability of the tool has been examined in terms of efficiency, effectiveness and learning effort.
... The Future Internet (FI) will be a complex interconnection of services, applications, content and media running in the cloud. In [1] we describe the main challenges associated with the testing of applications running in the FI. There we present a research agenda that has been defined in order to address the testing challenges identified. ...
... The Future Internet (FI) will be a complex interconnection of services, applications, content and media running in the cloud. In [1] we describe the main challenges associated with the testing of applications running in the FI. There we present a research agenda that has been defined in order to address the testing challenges identified. ...
Conference Paper
Testing a web application is typically very complicated. Imposing simple coverage criteria such as function or line coverage is often not sufficient to uncover bugs due to incorrect components integration. Combinatorial testing can enforce a stronger criterion, while still giving us the ability to prioritize the test cases to keep the overall effort feasible. To do it requires the whole testing domain to be classified and formalized, e.g. in terms of classification trees. At the system testing level, these trees are quite large. This short paper presents our preliminary work to automatically construct classification trees by logging the system and to subsequently calculate the coverage of our test runs against various combinatorial criteria. We use the tool CTE, it allows such a criteria to be custom specified, e.g. to take semantic constraints of the system into account. Furthermore, it comes with a graphical interface to let users specify new test sequences as needed.
... The FITTEST project involves a consortium of diverse competence, from execution monitoring, model-based testing [4], combinatorial testing [12], to search-based software engineering [7]. The ultimate goal of the project is to develop an Integrated Testing Environment (ITE for brevity) that consists of a suite of pluggable components that are being integrated for the FITTEST automated and continuous testing approach and tool set [15]. Being continuous, this testing approach will be suitable for testing fast evolving applications or those with dynamic and adaptive behaviors. ...
Conference Paper
This paper aims at evaluating a set of automated tools of the FITTEST EU project within an industrial case study. The case study was conducted at the IBM Research lab in Haifa, by a team responsible for building the testing environment for future development versions of an IBM system management product. The main function of that product is resource management in a networked environment. This case study has investigated whether current IBM Research testing practices could be improved or complemented by using some of the automated testing tools that were developed within the FITTEST EU project. Although the existing Test Suite from IBM Research (TSibm) that was selected for comparison is substantially smaller than the Test Suite generated by FITTEST (TSfittest), the effectiveness of TSfittest, measured by the injected faults coverage is significantly higher (50% vs 70%). With respect to efficiency, by normalizing the execution times, we found the TSfittest runs faster (9.18 vs. 6.99). This is due to the fact that the TSfittest includes shorter tests. Within IBM Research and for the testing of the target product in the simulated environment: the FITTEST tools can increase the effectiveness of the current practice and the test cases automatically generated by the FITTEST tools can help in more efficient identification of the source of the identified faults. Moreover, the FITTEST tools have shown the ability to automate testing within a real industry case.
Book
We worden omringd door software. Van pacemaker, koelkast tot tandenborstel, het zit overal in. Steeds meer worden onze levens hierdoor bepaald en beïnvloed. Kwaliteit van software garanderen is dus van vitaal belang. Helaas is goed werkende software nog steeds niet vanzelfsprekend. Sterker nog, als we zo doorgaan zouden we weleens dichtbij een softwarekwaliteitscrisis kunnen uitkomen. Een crisis waar het gebrek aan softwarekwaliteit meer kost dan de software zelf. Om dat te voorkomen moeten we een cultuur van softwarekwaliteitsdenken creëren. Tijdens deze oratie ga ik u vertellen hoe ik vanuit mijn leerstoel als hoogleraar Software Engineering van plan ben daaraan te werken. Ik zal u vertellen over mijn onderzoek op het gebied van het testen van software. Wat we doen, waarom we dat doen en wat de uitdagingen zijn. Ik zal een beeld te schetsen van onze plannen om de komende jaren te werken aan testtechnieken die in staat zijn zichzelf automatisch te leren wat de beste teststrategie is. Ook zal ik ingaan op hoe ik in mijn rol als docent en programmaleider van de bachelor informatica de toekomstige informatici wil onderwijzen in het softwarekwaliteitsdenken.
Article
In the context of service-based systems, applications access software services, either home-built or third-party, to orchestrate their functionality. Since such services evolve independently from the applications, the latter need to be tested to make sure that they work properly with the updated or new services. In a previous work we have proposed a test prioritization approach that ranks test cases based on their sensitivity to external service changes. The idea is to give priority to the tests that detect the highest number of artificial changes (mutations), because they have a higher chance of detecting real changes in external services. In this paper, we apply change-sensitivity based test prioritization to an industrial system from IBM within the FITTEST European project. Results indicate that the ranked test cases achieve automatically comparable performance as manual prioritization made by an experienced team.
Article
Full-text available
Runtime verification as a field faces several challenges. One key challenge is how to keep the overheads associated with its application low. This is especially important in real-time critical embedded applications, where memory and CPU resources are limited. Another challenge is that of devising expressive and yet user-friendly specification languages that can attract software engineers. In this paper, it is shown that for many systems, in-place logging provides a satisfactory basis for postmortem "runtime" verification of logs, where the overhead is already included in system design. Although this approach prevents an online reaction to detected errors, possible with traditional runtime verification, it provides a powerful tool for test automation and debugging—in this case, analysis of spacecraft telemetry by ground operations teams at NASA's Jet Propulsion Laboratory. The second challenge is addressed in the presented work through a temporal specification language, designed in collaboration with Jet Propulsion Laboratory test engineers. The specification language allows for descriptions of relationships between data-rich events (records) common in logs, and is translated into a form of automata supporting data-parameterized states. The automaton language is inspired by the rule-based language of the RULER runtime verification system.A case study is presented illustrating the use of our LOGSCOPE tool by software test engineers for the 2011 Mars Science Laboratory mission.
Article
Full-text available
To learn what constitutes correct program behavior, one can start with normal behavior. We observe actual program executions to construct state machines that summarize object behavior. These state machines, called object behavior models, capture the rela-tionships between two kinds of methods: mutators that change the state (such as add()) and inspectors that keep the state un-changed (such as isEmpty()): "A Vector object initially is in isEmpty() state; after add(), it goes into ¬isEmpty() state". Our ADABU prototype for JAVA has successfully mined models of undocumented behavior from the AspectJ compiler and the Columba email client; the models tend to be small and easily understandable.
Article
Full-text available
Many companies have adopted Process-aware Information Systems (PAIS) to support their business processes in some form. On the one hand these systems typically log events (e.g., in transaction logs or audit trails) related to the actual business process executions. On the other hand explicit process models describing how the business process should (or is expected to) be executed are frequently available. Together with the data recorded in the log, this situation raises the interesting question “Do the model and the log conform to each other?”. Conformance checking, also referred to as conformance analysis, aims at the detection of inconsistencies between a process model and its corresponding execution log, and their quantification by the formation of metrics. This paper proposes an incremental approach to check the conformance of a process model and an event log. First of all, the fitness between the log and the model is measured (i.e., “Does the observed process comply with the control flow specified by the process model?”). Second, the appropriateness of the model can be analyzed with respect to the log (i.e., “Does the model describe the observed process in a suitable way?”). Appropriateness can be evaluated from both a structural and a behavioral perspective. To operationalize the ideas presented in this paper a Conformance Checker has been implemented within the ProM framework, and it has been evaluated using artificial and real-life event logs.
Article
Full-text available
Search-based software testing is the application of metaheuristic search techniques to generate software tests. The test adequacy criterion is transformed into a fitness function and a set of solutions in the search space are evaluated with respect to the fitness function using a metaheuristic search technique. The application of metaheuristic search techniques for testing is promising due to the fact that exhaustive testing is infeasible considering the size and complexity of software under test. Search-based software testing has been applied across the spectrum of test case design methods; this includes white-box (structural), black-box (functional) and grey-box (combination of structural and functional) testing. In addition, metaheuristic search techniques have also been applied to test non-functional properties. The overall objective of undertaking this systematic review is to examine existing work into non-functional search-based software testing (NFSBST). We are interested in types of non-functional testing targeted using metaheuristic search techniques, different fitness functions used in different types of search-based non-functional testing and challenges in the application of these techniques. The systematic review is based on a comprehensive set of 35 articles obtained after a multi-stage selection process and have been published in the time span 1996–2007. The results of the review show that metaheuristic search techniques have been applied for non-functional testing of execution time, quality of service, security, usability and safety. A variety of metaheuristic search techniques are found to be applicable for non-functional testing including simulated annealing, tabu search, genetic algorithms, ant colony methods, grammatical evolution, genetic programming (and its variants including linear genetic programming) and swarm intelligence methods. The review reports on different fitness functions used to guide the search for each of the categories of execution time, safety, usability, quality of service and security; along with a discussion of possible challenges in the application of metaheuristic search techniques.
Conference Paper
Full-text available
Dynamic analysis of software systems produces behavioral models that are useful for analysis, verification and testing. The main techniques for extracting models of functional behavior generate either models of constraints on data, usually in the form of Boolean expressions, or models of interactions between components, usually in the form of finite state machines. Both data and interaction models are useful for analyzing and verifying different aspects of software behavior, but none of them captures the complex interplay between data values and components interactions. Thus related analysis and testing techniques can miss important information. In this paper, we focus on the generation of models of relations between data values and component interactions, and we present GK-tail, a technique to automatically generate extended finite state machines (EFSMs) from interaction traces. EFSMs model the interplay between data values and component interactions by annotating FSM edges with conditions on data values. We show that EFSMs include details that are not captured by either Boolean expressions or (classic) FSM alone, and allow for more accurate analysis and verification than separate models, even if considered jointly.
Conference Paper
Full-text available
AJAX-based Web 2.0 applications rely on stateful asynchronous client/server communication, and client-side runtime manipulation of the DOM tree. This not only makes them fundamentally different from traditional web applications, but also more error-prone and harder to test. We propose a method for testing AJAX applications automatically, based on a crawler to infer a flow graph for all (client-side) user interface states. We identify AJAX-specific faults that can occur in such states (related to DOM validity, error messages, discoverability, back-button compatibility, etc.) as well as DOM-tree invariants that can serve as oracle to detect such faults. We implemented our approach in ATUSA, a tool offering generic invariant checking components, a plugin-mechanism to add application-specific state validators, and generation of a test suite covering the paths obtained during crawling. We describe two case studies evaluating the fault revealing capabilities, scalability, required manual effort and level of automation of our approach.
Conference Paper
Full-text available
It has become essential to scrutinize and evaluate software development methodologies, mainly because of their increasing number and variety. Evaluation is required to gain a better understanding of the features, strengths, and weaknesses of the methodologies. The results of such evaluations can be leveraged to identify the methodology most appropriate for a specific context. Moreover, methodology improvement and evolution can be accelerated using these results. However, despite extensive research, there is still a need for a feature/criterion set that is general enough to allow methodologies to be evaluated regardless of their types. We propose a general evaluation framework which addresses this requirement. In order to improve the applicability of the proposed framework, all the features - general and specific - are arranged in a hierarchy along with their corresponding criteria. Providing different levels of abstraction enables users to choose the suitable criteria based on the context. Major evaluation frameworks for object-oriented, agent-oriented, and aspect-oriented methodologies have been studied and assessed against the proposed framework to demonstrate its reliability and validity.
Conference Paper
Full-text available
Code coverage is a common aid in the testing process. It is generally used for marking the source code segments that were executed and, more importantly, those that were not executed. Many code coverage tools exist, supporting a variety of languages and operating systems. Unfortunately, these tools provide little or no assistance when code coverage data is voluminous. Such quantities are typical of system tests and even for earlier testing phases. Drill-down capabilities that look at different granularities of the data, starting with di- rectories and going through files to functions and lines of source code, are insufficient. Such capabilities make the assumption that the coverage issues themselves follow the code hierarchy. We argue that this is not the case for much of the uncovered code. Two notable examples are error han- dling code and platform-specific constructs. Both tend to be spread throughout the source in many files, even though the related coverage, or lack thereof, is highly dependent. To make the task more manageable, and therefore more likely to be performed by users, we developed a hole analy- sis algorithm and tool that is based on common substrings in the names of functions. We tested its effectiveness using two large IBM software systems. In both of them, we asked domain experts to judge the results of several hole-ranking heuristics. They found that 57%-87% of the 30 top-ranked holes identified by the effective heuristics are relevant. More- over, these holes are often unexpected. This is especially impressive because substring hole analysis relies only on the names of functions, whereas domain experts have a broad and deep understanding of the system. We grounded our results in a theoretical framework that states desirable mathematical properties of hole ranking heuris- tics. The empirical results show that heuristics with these properties tend to perform better, and do so more consis- tently, than heuristics lacking them.
Article
The basic problem in software testing is selecting a set of test cases. This article presents a test case design using the Classification Tree Editor, a test case generation software tool. It shows how to integrate weighting factors into classification trees and generate prioritized test suites. The classification tree editor includes classical approaches such as minimal combination and pair-wise as well as weighted counterparts, statistical testing and new generating rules. The software allows for prioritization by occurrence probability, error probability or risk.
Article
Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often used in assert statements, documentation, and formal specifications. Examples include being constant (x=a), non-zero (x≠0), being in a range (a≤x≤b), linear relationships (y=ax+b), ordering (x≤y), functions from a library (), containment (x∈y), sortedness (), and many more. Users can extend Daikon to check for additional invariants.Dynamic invariant detection runs a program, observes the values that the program computes, and then reports properties that were true over the observed executions. Dynamic invariant detection is a machine learning technique that can be applied to arbitrary data. Daikon can detect invariants in C, C++, Java, and Perl programs, and in record-structured data sources; it is easy to extend Daikon to other applications.Invariants can be useful in program understanding and a host of other applications. Daikon’s output has been used for generating test cases, predicting incompatibilities in component integration, automating theorem proving, repairing inconsistent data structures, and checking the validity of data streams, among other tasks.Daikon is freely available in source and binary form, along with extensive documentation, at http://pag.csail.mit.edu/daikon/.