Arie van Deursen

Delft University of Technology, Delft, South Holland, Netherlands

Are you Arie van Deursen?

Claim your profile

Publications (217)34.47 Total impact

  • Arie van Deursen, Ali Mesbah, Alex Nederlof
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we review five years of research in the field of automated crawling and testing of web applications. We describe the open source Crawljax tool, and the various extensions that have been proposed in order to address such issues as cross-browser compatibility testing, web application regression testing, and style sheet usage analysis.Based on that we identify the main challenges and future directions of crawl-based testing of web applications. In particular, we explore ways to reduce the exponential growth of the state space, as well as ways to involve the human tester in the loop, thus reconciling manual exploratory testing and automated test input generation. Finally, we sketch the future of crawl-based testing in the light of upcoming developments, such as the pervasive use of touch devices and mobile computing, and the increasing importance of cyber-security.
    Science of Computer Programming 01/2015; 97. · 0.57 Impact Factor
  • Eric Bouwers, Arie van Deursen, Joost Visser
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past two decades both the industry and the research community have proposed hundreds of metrics to track software projects, evaluate quality or estimate effort. Unfortunately, it is not always clear which metric works best in a particular context. Even worse, for some metrics there is little evidence whether the metric measures the attribute it was designed to measure. In this paper we propose a catalog format for software metrics as a first step towards a consolidated overview of available software metrics. This format is designed to provide an overview of the status of a metric in a glance, while providing enough information to make an informed decision about the use of the metric. We envision this format to be implemented in a (semantic) wiki to ensure that relationships between metrics can be followed with ease.
    06/2014;
  • Georgios Gousios, Martin Pinzger, Arie van Deursen
    [Show abstract] [Hide abstract]
    ABSTRACT: The advent of distributed version control systems has led to the development of a new paradigm for distributed software development; instead of pushing changes to a central repository, developers pull them from other repositories and merge them locally. Various code hosting sites, notably Github, have tapped on the opportunity to facilitate pull-based development by offering workflow support tools, such as code reviewing systems and integrated issue trackers. In this work, we explore how pull-based software development works, first on the GHTorrent corpus and then on a carefully selected sample of 291 projects. We find that the pull request model offers fast turnaround, increased opportunities for community engagement and decreased time to incorporate contributions. We show that a relatively small number of factors affect both the decision to merge a pull request and the time to process it. We also examine the reasons for pull request rejection and find that technical ones are only a small minority.
    05/2014;
  • Alex Nederlof, Ali Mesbah, Arie van Deursen
    [Show abstract] [Hide abstract]
    ABSTRACT: Today’s web applications increasingly rely on client-side code execution. HTML is not just created on the server, but manipulated extensively within the browser through JavaScript code. In this paper, we seek to understand the software engineering implications of this. We look at deviations from many known best practices in such areas of performance, accessibility, and correct structuring of HTML documents. Furthermore, we assess to what extent such deviations manifest themselves through client-side code manipulation only. To answer these questions, we conducted a large scale experiment, involving automated client-enabled crawling of over 4000 web applications, resulting in over 100,000,000 pages analyzed, and close to 1,000,000 unique client-side user interface states. Our findings show that the majority of sites contain a substantial number of problems, making sites unnecessarily slow, inaccessible for the visually impaired, and with layout that is unpredictable due to errors in the dynamically modified DOM trees.
    05/2014;
  • Nicolas Dintzner, Arie Van Deursen, Martin Pinzger
    [Show abstract] [Hide abstract]
    ABSTRACT: The Linux kernel feature model has been studied as an example of large scale evolving feature model and yet details of its evolution are not known. We present here a classification of feature changes occurring on the Linux kernel feature model, as well as a tool, FMDiff, designed to automatically extract those changes. With this tool, we obtained the history of more than twenty architecture specific feature models, over ten releases and compared the recovered information with Kconfig file changes. We establish that FMDiff provides a comprehensive view of feature changes and show that the collected data contains promising information regarding the Linux feature model evolution.
    Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive Systems; 01/2014
  • Source
    Tao Xie, Thomas Zimmermann, Arie van Deursen
    [Show abstract] [Hide abstract]
    ABSTRACT: Support for generic programming was added to the Java language in 2004, representing perhaps the most significant change to one of the most widely used programming languages today. Researchers and language designers anticipated this addition would relieve ...
    Empirical Software Engineering 12/2013; 18(6). · 1.18 Impact Factor
  • M. Greiler, A. van Deursen, M.-A. Storey
    [Show abstract] [Hide abstract]
    ABSTRACT: Designing automated tests is a challenging task. One important concern is how to design test fixtures, i.e. code that initializes and configures the system under test so that it is in an appropriate state for running particular automated tests. Test designers may have to choose between writing in-line fixture code for each test or refactor fixture code so that it can be reused for other tests. Deciding on which approach to use is a balancing act, often trading off maintenance overhead with slow test execution. Additionally, over time, test code quality can erode and test smells can develop, such as the occurrence of overly general fixtures, obscure in-line code and dead fields. In this paper, we show that test smells related to fixture set-up occur in industrial projects. We present a static analysis technique to identify fixture related test smells. We implemented this test analysis technique in a tool, called TestHound, which provides reports on test smells and recommendations for refactoring the smelly test code. We evaluate the tool through three industrial case studies and show that developers find that the tool helps them to understand, reflect on and adjust test code.
    Software Testing, Verification and Validation (ICST), 2013 IEEE Sixth International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Distributed teams face the challenge of staying connected. How do team members stay connected when they no longer see each other on a daily basis? What should be done when there is no coffee corner to share your latest exploits? In this paper we evaluate a microblogging system which makes this possible in a distributed setting. The system, WeHomer, enables the sharing of information and corresponding emotions in a fully distributed organization. We analyzed the content of over a year of usage data by 19 team members in a structured fashion, performed 5 semi-structured interviews and report our findings in this paper. We draw conclusions about the topics shared, the impact on software teams and the impact of distribution and team composition. Main findings include an increase in team-connectedness and easier access to information that is traditionally harder to consistently acquire.
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on; 01/2013
  • F. Hermans, B. Sedee, M. Pinzger, A. van Deursen
    [Show abstract] [Hide abstract]
    ABSTRACT: Spreadsheets are widely used in industry: it is estimated that end-user programmers outnumber programmers by a factor 5. However, spreadsheets are error-prone, numerous companies have lost money because of spreadsheet errors. One of the causes for spreadsheet problems is the prevalence of copy-pasting. In this paper, we study this cloning in spreadsheets. Based on existing text-based clone detection algorithms, we have developed an algorithm to detect data clones in spreadsheets: formulas whose values are copied as plain text in a different location. To evaluate the usefulness of the proposed approach, we conducted two evaluations. A quantitative evaluation in which we analyzed the EUSES corpus and a qualitative evaluation consisting of two case studies. The results of the evaluation clearly indicate that 1) data clones are common, 2) data clones pose threats to spreadsheet quality and 3) our approach supports users in finding and resolving data clones.
    Software Engineering (ICSE), 2013 35th International Conference on; 01/2013
  • E. Bouwers, A. van Deursen, J. Visser
    [Show abstract] [Hide abstract]
    ABSTRACT: Using software metrics to keep track of the progress and quality of products and processes is a common practice in industry. Additionally, designing, validating and improving metrics is an important research area. Although using software metrics can help in reaching goals, the effects of using metrics incorrectly can be devastating. In this tutorial we leverage 10 years of metrics-based risk assessment experience to illustrate the benefits of software metrics, discuss different types of metrics and explain typical usage scenario's. Additionally, we explore various ways in which metrics can be interpreted using examples solicited from participants and practical assignments based on industry cases. During this process we will present the four common pitfalls of using software metrics. In particular, we explain why metrics should be placed in a context in order to maximize their benefits. A methodology based on benchmarking to provide such a context is discussed and illustrated by a model designed to quantify the technical quality of a software system. Examples of applying this model in industry are given and challenges involved in interpreting such a model are discussed. This tutorial provides an in-depth overview of the benefits and challenges involved in applying software metrics. At the end you will have all the information you need to use, develop and evaluate metrics constructively.
    Software Engineering (ICSE), 2013 35th International Conference on; 01/2013
  • S. Raemaekers, A. van Deursen, J. Visser
    [Show abstract] [Hide abstract]
    ABSTRACT: We present the Maven Dependency Dataset (MDD), containing metrics, changes and dependencies of 148,253 jar files. Metrics and changes have been calculated at the level of individual methods, classes and packages of multiple library versions. A complete call graph is also presented which includes call, inheritance, containment and historical relationships between all units of the entire repository. In this paper, we describe our dataset and the methodology used to obtain it. We present different conceptual views of MDD and we also describe limitations and data quality issues that researchers using this data should be aware of.
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on; 01/2013
  • S. Raemaekers, G.F. Nane, A. van Deursen, J. Visser
    [Show abstract] [Hide abstract]
    ABSTRACT: Best practices in software development state that code that is likely to change should be encapsulated to localize possible modifications. In this paper, we investigate the application and effects of this design principle. We investigate the relationship between the stability, encapsulation and popularity of libraries on a dataset of 148,253 Java libraries. We find that bigger systems with more rework in existing methods have less stable interfaces and that bigger systems tend to encapsulate dependencies better. Additionally, there are a number of factors that are associated with change in library interfaces, such as rework in existing methods, system size, encapsulation of dependencies and the number of dependencies. We find that current encapsulation practices are not targeted at libraries that change the most. We also investigate the strength of ripple effects caused by instability of dependencies and we find that libraries cause ripple effects in systems using them and that these effects can be mitigated by encapsulation.
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Open source software (OSS) development teams use electronic means, such as emails, instant messaging, or forums, to conduct open and public discussions. Researchers investigated mailing lists considering them as a hub for project communication. Prior work focused on specific aspects of emails, for example the handling of patches, traceability concerns, or social networks. This led to insights pertaining to the investigated aspects, but not to a comprehensive view of what developers communicate about. Our objective is to increase the understanding of development mailing lists communication. We quantitatively and qualitatively analyzed a sample of 506 email threads from the development mailing list of a major OSS project, Lucene. Our investigation reveals that implementation details are discussed only in about 35% of the threads, and that a range of other topics is discussed. Moreover, core developers participate in less than 75% of the threads. We observed that the development mailing list is not the main player in OSS project communication, as it also includes other channels such as the issue repository.
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on; 01/2013
  • M. Greiler, A. Zaidman, A. van Deursen, M.-A. Storey
    [Show abstract] [Hide abstract]
    ABSTRACT: An important challenge in creating automated tests is how to design test fixtures, i.e., the setup code that initializes the system under test before actual automated testing can start. Test designers have to choose between different approaches for the setup, trading off maintenance overhead with slow test execution. Over time, test code quality can erode and test smells can develop, such as the occurrence of overly general fixtures, obscure inline code and dead fields. In this paper, we investigate how fixture-related test smells evolve over time by analyzing several thousand revisions of five open source systems. Our findings indicate that setup management strategies strongly influence the types of test fixture smells that emerge in code, and that several types of fixture smells often emerge at the same time. Based on this information, we recommend important guidelines for setup strategies, and suggest how tool support can be improved to help in both avoiding the emergence of such smells as well as how to refactor code when test smells do appear.
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on; 01/2013
  • E. Bouwers, A. van Deursen, J. Visser
    [Show abstract] [Hide abstract]
    ABSTRACT: A wide range of software metrics targeting various abstraction levels and quality attributes have been proposed by the research community. For many of these metrics the evaluation consists of verifying the mathematical properties of the metric, investigating the behavior of the metric for a number of open-source systems or comparing the value of the metric against other metrics quantifying related quality attributes. Unfortunately, a structural analysis of the usefulness of metrics in a real-world evaluation setting is often missing. Such an evaluation is important to understand the situations in which a metric can be applied, to identify areas of possible improvements, to explore general problems detected by the metrics and to define generally applicable solution strategies. In this paper we execute such an analysis for two architecture level metrics, Component Balance and Dependency Profiles, by analyzing the challenges involved in applying these metrics in an industrial setting. In addition, we explore the usefulness of the metrics by conducting semi-structured interviews with experienced assessors. We document the lessons learned both for the application of these specific metrics, as well as for the method of evaluating metrics in practice.
    Software Engineering (ICSE), 2013 35th International Conference on; 01/2013
  • Source
    Felienne Hermans, Martin Pinzger, Arie van Deursen
    [Show abstract] [Hide abstract]
    ABSTRACT: Spreadsheets are widely used in industry, because they are flexible and easy to use. Sometimes they are even used for business-critical applications. It is however difficult for spreadsheet users to correctly assess the quality of spreadsheets, especially with respect to their understandability. Understandability of spreadsheets is important, since spreadsheets often have a long lifespan, during which they are used by several users. In this paper, we establish a set of spreadsheet understandability metrics. We start by studying related work and interviewing 40 spreadsheet professionals to obtain a set of characteristics that might contribute to understandability problems in spreadsheets. Based on those characteristics we subsequently determine a number of understandability metrics. To evaluate the usefulness of our metrics, we conducted a series of experiments in which professional spreadsheet users performed a number of small maintenance tasks on a set of spreadsheets from the EUSES spreadsheet corpus. We subsequently calculate the correlation between the metrics and the performance of subjects on these tasks. The results clearly indicate that the number of ranges, the nesting depth and the presence of conditional operations in formulas significantly increase the difficulty of understanding a spreadsheet.
    09/2012;
  • Michaela Greiler, Arie van Deursen, Andy Zaidman
    [Show abstract] [Hide abstract]
    ABSTRACT: In order to support test suite understanding, we investigate whether we can automatically derive relations between test cases. In particular, we search for trace-based similarities between (high-level) end-to-end tests on the one hand and fine grained unit tests on the other. Our approach uses the shared word count metric to determine similarity. We evaluate our approach in two case studies and show which relations between end-to-end and unit tests are found by our approach, and how this information can be used to support test suite understanding.
    Proceedings of the 50th international conference on Objects, Models, Components, Patterns; 05/2012
  • Eric Bouwers, Joost Visser, Arie van Deursen
    [Show abstract] [Hide abstract]
    ABSTRACT: Four common pitfalls in using software metrics for project management.
    Queue 05/2012;
  • Source
    Ali Mesbah, Arie van Deursen, Stefan Lenselink
    [Show abstract] [Hide abstract]
    ABSTRACT: Using JavaScript and dynamic DOM manipulation on the client side of Web applications is becoming a widespread approach for achieving rich interactivity and responsiveness in modern Web applications. At the same time, such techniques---collectively known as Ajax---shatter the concept of webpages with unique URLs, on which traditional Web crawlers are based. This article describes a novel technique for crawling Ajax-based applications through automatic dynamic analysis of user-interface-state changes in Web browsers. Our algorithm scans the DOM tree, spots candidate elements that are capable of changing the state, fires events on those candidate elements, and incrementally infers a state machine that models the various navigational paths and states within an Ajax application. This inferred model can be used in program comprehension and in analysis and testing of dynamic Web states, for instance, or for generating a static version of the application. In this article, we discuss our sequential and concurrent Ajax crawling algorithms. We present our open source tool called Crawljax, which implements the concepts and algorithms discussed in this article. Additionally, we report a number of empirical studies in which we apply our approach to a number of open-source and industrial Web applications and elaborate on the obtained results.
    ACM Transactions on The Web - TWEB. 01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Testing plug-in-based systems is challenging due to complex interactions among many different plug-ins, and variations in version and configuration. The objective of this paper is to increase our understanding of what testers and developers think and do when it comes to testing plug-in-based systems. To that end, we conduct a qualitative (grounded theory) study, in which we interview 25 senior practitioners about how they test plug-in applications based on the Eclipse plug-in architecture. The outcome is an overview of the testing practices currently used, a set of identified barriers limiting test adoption, and an explanation of how limited testing is compensated by self-hosting of projects and by involving the community. These results are supported by a structured survey of more than 150 professionals. The study reveals that unit testing plays a key role, whereas plug-in specific integration problems are identified and resolved by the community. Based on our findings, we propose a series of recommendations and areas for future research.
    Proceedings - International Conference on Software Engineering 01/2012;

Publication Stats

4k Citations
34.47 Total Impact Points

Institutions

  • 2003–2014
    • Delft University of Technology
      • Faculty of Electrical Engineering, Mathematics and Computer Sciences (EEMCS)
      Delft, South Holland, Netherlands
  • 2008
    • ASML
      Veldhoven, North Brabant, Netherlands
  • 2005
    • Durham University
      Durham, England, United Kingdom
  • 2000–2005
    • Centrum Wiskunde & Informatica
      Amsterdamo, North Holland, Netherlands
  • 1997
    • Technische Universiteit Eindhoven
      Eindhoven, North Brabant, Netherlands
    • University of Amsterdam
      Amsterdamo, North Holland, Netherlands