Jurgen Vinju

Jurgen Vinju
Centrum Wiskunde & Informatica | CWI · Research Group for Software Analysis and Transformation

Prof. Computer Science

About

152
Publications
14,078
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,031
Citations
Introduction
I'm interested in programming languages and in particular meta-programming and its application to software engineering.in the form of reverse engineering, software analysis, software analytics and model driven engineering via domain specific languages
Additional affiliations
September 2014 - present
Eindhoven University of Technology
Position
  • Part-time Full Professor Automated Software Analysis
September 2005 - August 2014
University of Amsterdam
Position
  • Lecturer
Description
  • Software Evolution & Software Construction master courses, supervision of masters thesis
February 2000 - present
Centrum Wiskunde & Informatica
Position
  • Group Leader

Publications

Publications (152)
Conference Paper
Full-text available
We compare the Visitor pattern with the Interpreter pattern, investigating a single case in point for the Java language. We have produced and compared two versions of an interpreter for a programming language. The first version makes use of the Visitor pattern. The second version was obtained by using an automated refactoring to transform uses of t...
Article
Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with resp...
Article
The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemen...
Article
Full-text available
Just like any software, libraries evolve to incorporate new features, bug fixes, security patches, and refactorings. However, when a library evolves, it may break the contract previously established with its clients by introducing Breaking Changes (BCs) in its API. These changes might trigger compile-time, link-time, or run-time errors in client co...
Article
Full-text available
This is an industrial experience report on a large semi‐automated migration of legacy test code in C and C++. The particular migration was enabled by automating most of the maintenance steps. Without automation this particular large‐scale migration would not have been conducted, due to the risks involved in manual maintenance (risk of introducing e...
Preprint
Just like any software, libraries evolve to incorporate new features, bug fixes, security patches, and refactorings. However, when a library evolves, it may break the contract previously established with its clients by introducing Breaking Changes (BCs) in its API. These changes might trigger compile-time, link-time, or run-time errors in client co...
Preprint
Full-text available
Block-based environments are visual programming environments, which are becoming more and more popular because of their ease of use. The ease of use comes thanks to their intuitive graphical representation and structural metaphors (jigsaw-like puzzles) to display valid combinations of language constructs to the users. Part of the current popularity...
Chapter
Clear consistency guarantees on data are paramount for the design and implementation of distributed systems. When implementing distributed applications, developers require approaches to verify the data consistency guarantees of an implementation choice. Crooks et al. define a state-based and client-centric model of database isolation. This paper fo...
Preprint
Full-text available
Context: Computational notebooks are a contemporary style of literate programming, in which users can communicate and transfer knowledge by interleaving executable code, output, and prose in a single rich document. A Domain-Specific Language (DSL) is an artificial software language tailored for a particular application domain. Usually, DSL users ar...
Conference Paper
Relational model finding is a successful technique which has been used in a wide range of problems during the last decade. This success is partly due to the fact that many problems contain relational structures which can be explored using relational model finders. Although these model finders allow for the exploration of such structures they often...
Conference Paper
Full-text available
In high-throughput, distributed systems, such as large-scale banking infrastructure, synchronization between actors becomes a bottle-neck in high-contention scenarios. This results in delays for users, and reduces opportunities for scaling such systems. This paper proposes Static Local Coordination Avoidance, which analyzes application invariants a...
Preprint
Full-text available
Concurrent objects with asynchronous messaging are an increasingly popular way to structure highly available, high performance, large-scale software systems. To ensure data-consistency and support synchronization between objects such systems often use an atomic commitment protocol such as Two-Phase commit (2PC). In highly available, high-throughput...
Article
Full-text available
Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is error-prone, and meta programs are not resilie...
Conference Paper
Interactive notebooks allow people to communicate and collaborate through a single rich document that might include live code, multimedia, computed results, and documentation, which is persisted as a whole for reproducibility. Notebooks are currently being used extensively in domains such as data science, data journalism, and machine learning. Howe...
Conference Paper
An immutable multi-map is a many-to-many map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming languages, or in static analysis of object-oriented systems. Collection data structures are assumed to...
Article
An immutable multi-map is a many-to-many map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming languages, or in static analysis of object-oriented systems. Collection data structures are assumed to...
Conference Paper
OSGi is a module system and service framework that aims to fill Java's lack of support for modular development. Using OSGi, developers divide software into multiple bundles that declare constrained dependencies towards other bundles. However, there are various ways of declaring and managing such dependencies, and it can be confusing for developers...
Conference Paper
Full-text available
Interactive notebooks, such as provided by the Jupyter platform [2], are gaining traction in scientific computing, data science, and machine learning. Developing a Jupyter kernel machinery for a new language, however, requires considerable effort. In this extended abstract, we present Bacatá, a language-parametric bridge between Jupyter and the Ras...
Article
INTRODUCTION During the preparation of the corresponding chapter in Davy Landman's PhD thesis, some minor graphical and statistical discrepancies were found in the paper “Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions.” To support future reproduction and use of this work, we prepared the...
Article
Welcome to the sixth special issue on Experimental Software and Toolkits (EST) of Elsevier’s Science of Computer Programming journal. The EST series aims at allowing academic software developers to publish the software systems they developed with similar rigour as academic papers. Peer reviewers evaluated not only a scientific paper about the devel...
Conference Paper
There are many declarative frameworks that allow us to implement code formatters relatively easily for any specific language, but constructing them is cumbersome. The first problem is that “everybody” wants to format their code differently, leading to either many formatter variants or a ridiculous number of configuration options. Second, the size o...
Conference Paper
Large organizations like banks suffer from the ever growing complexity of their systems. Evolving the software becomes harder and harder since a single change can affect a much larger part of the system than predicted upfront. A large contributing factor to this problem is that the actual domain knowledge is often implicit, incomplete, or out of da...
Conference Paper
Collection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain users from using an optimal variant for a given problem. Although a wide range of specialized col...
Article
Collection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain users from using an optimal variant for a given problem. Although a wide range of specialized col...
Article
Full-text available
An immutable multi-map is a many-to-many thread-friendly map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in static analysis of object-oriented systems. When processing such big data sets the memory overhead of the data structure e...
Article
Full-text available
There are many declarative frameworks that allow us to implement code formatters relatively easily for any specific language, but constructing them is cumbersome. The first problem is that "everybody" wants to format their code differently, leading to either many formatter variants or a ridiculous number of configuration options. Second, the size o...
Article
Today, PHP is one of the most popular programming languages, and is commonly used in the open source community and in industry to build large application frameworks and web applications. In this paper, we discuss our ongoing work on PHP AiR, a framework for PHP Analysis in Rascal. PHP AiR is focused especially on program analysis and empirical soft...
Conference Paper
Software fault prediction promises to be a powerful tool in supporting test engineers upon their decision where to define testing hotspots. However, there are limitations on a cross project prediction and a lack of reports upon application to industrial software, as well as the power of metrics to represent bugs. In this paper, we present a novel a...
Conference Paper
It is noticeably hard to predict the effect of optimization strategies in Java without implementing them. "Maximal sharing" (a.k.a. "hash-consing") is one of these strategies that may have great benefit in terms of time and space, or may have detrimental overhead. It all depends on the redundancy of data and the use of equality. We used a combinati...
Conference Paper
Full-text available
Software is fundamental to academic research work, both as part of the method and as the result of research. In June 2016 25 people gathered at Schloss Dagstuhl for a week-long Perspectives Workshop and began to develop a manifesto which places emphasis on the scholarly value of academic software and on personal responsibility. Twenty pledges cover...
Conference Paper
In grammar-based testing, context-free grammars may be used to generate relevant test inputs for language processors, or meta programs, such as programming language compilers, refactoring tools, and implementations of software quality metrics. This technique can be used to test these meta programs, but the amount of sentences, and syntax trees ther...
Article
All software evolves, and programming languages and programming language tools are no exception. And just like in ordinary software construction, modular implementations can help ease the process of changing a language implementation and its dependent tools. However, the syntactic and semantic dependencies between language features make this a chal...
Conference Paper
Full-text available
Deciding whether an open source software (OSS) project meets the required standards for adoption in terms of quality, maturity, activity of development and user support is not a straightforward process as it involves exploring various sources of information. Such sources include OSS source code repositories, communication channels such as newsgroup...
Article
This short paper introduces M3, a simple and extensible model for capturing facts about source code for future analysis. M3 is a core part of the standard library of the Rascal meta programming language. We motivate it, position it to related work and detail the key design aspects.
Conference Paper
Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with resp...
Article
Full-text available
The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and t...
Article
Full-text available
Dynamic languages include a number of features that are challenging to model properly in static analysis tools. In PHP, one of these features is the include expression, where an arbitrary expression provides the path of the file to include at runtime. In this paper we present two complementary analyses for statically resolving PHP includes, one tha...
Book
This book constitutes the refereed proceedings of the 7th International Conference on Software Language Engineering, SLE 2014, held in Västerås, Sweden, in September 2014. The 19 revised full papers presented together with 1 invited paper were carefully reviewed and selected from 61 initial submissions. The papers observe software languages from di...
Article
Full-text available
This document details design considerations of M3: a meta model for source code artifacts
Article
Software projects consist of different kinds of artifacts: build files, configuration files, markup files, source code in different software languages, and so on. At the same time, however, most integrated development environments (IDEs) are focused on a single (programming) language. Even if a programming environment supports multiple languages (e...
Conference Paper
In this paper we present an approach to specifying operator precedence based on declarative disambiguation constructs and an implementation mechanism based on grammar rewriting. We identify a problem with existing generalized context-free parsing and disambiguation technology: generating a correct parser for a language such as OCaml using declarati...
Article
Attribute grammars are a powerful specification paradigm for many language processing tasks, particularly semantic analysis of programming languages. Recent attribute grammar systems use dynamic scheduling algorithms to evaluate attributes on demand. ...
Conference Paper
Full-text available
On behalf of the SCAM 2013 Conference and Program Committee, we would like to welcome you to the capital of Dutch industrial design, i.e., Eindhoven, the Netherlands, for the 13th IEEE International Working Conference on Source Code Analysis and Manipulation, co-located with the 29th IEEE International Conference on Software Maintenance (ICSM 2013)...
Conference Paper
We are interested in re-engineering families of legacy applications towards using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting domain knowledge from the source code of legacy applications? Reverse engineering domain knowledge from source code is sometimes considered very hard or even impossible. Is it also difficult for "m...
Conference Paper
PHP is one of the most popular languages for server-side application development. The language is highly dynamic, providing programmers with a large amount of flexibility. However, these dynamic features also have a cost, making it difficult to apply traditional static analysis techniques used in standard code analysis and transformation tools. As...
Conference Paper
Full-text available
Meta-programming applications often require access to heterogeneous sources of information, often from different technological spaces (grammars, models, ontologies, databases), that have specialized ways of defining their respective data schemas. Without direct language support, obtaining typed access to this external, potentially changing, informa...
Conference Paper
Assessing the understandability of source code remains an elusive yet highly desirable goal for software developers and their managers. While many metrics have been suggested and investigated empirically, the McCabe cyclomatic complexity metric (CC) - which is based on control flow complexity - seems to hold enduring fascination within both industr...
Conference Paper
To facilitate experimentation with creating new, complex refactorings, we want to reuse existing transformation and analysis code as orchestrated parts of a larger refactoring: i.e., to script refactorings. The language we use to perform this scripting must be able to deal with the diversity of languages, tools, analyses, and transformations that a...
Conference Paper
Rascal is a meta programming language focused on the implementation of domain-specific languages and on the rapid construction of tools for software analysis and software transformation. In this paper we focus on the use of Rascal for software analysis. We illustrate a range of scenarios for building new software analysis tools through a number of...
Conference Paper
Presents the welcome message from the conference proceedings.
Conference Paper
Full-text available
The Rascal meta-programming language provides a number of features supporting the development of program analysis tools. However, sometimes the analysis to be developed is already implemented by another system. In this case, Rascal can provide a useful front-end for this system, handling the parsing of the input program, any transformation (if need...
Conference Paper
Full-text available
In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b)...
Conference Paper
Full-text available
Static ambiguity detection would be an important aspect of language workbenches for textual software languages. However, the challenge is that automatic ambiguity detection in context-free grammars is undecidable in general. Sophisticated approximations and optimizations do exist, but these do not scale to grammars for so-called "scannerless parser...
Article
Full-text available
Algebraic specification has a long tradition in bridging the gap between specification and programming by making specifications executable. Building on extensive experience in designing, implementing and using specification formalisms that are based on algebraic specification and term rewriting (namely Asf and Asf+Sdf), we are now focusing on using...
Presentation
For IFIP WG 2.3 Software Implementation Technology
Conference Paper
Full-text available
Model-driven software development (MDSD) has been on the rise over the past few years and is becoming more and more mature. However, evaluation in real-life industrial context is still scarce. In this paper, we present a case-study evaluating the applicability of a state-of-the-art MDSD tool, modJ, a suite of domain specific languages (DSLs) for d...
Article
Full-text available
In this paper we present prototype tool-support for the runtime assertion checking of the Java Modeling Language (JML) extended with communication histories specified by attribute grammars. Our tool suite integrates Rascal, a meta programming language and ANTLR, a popular parser generator. Rascal instantiates a generic model of history updates for...
Article
Full-text available
Does the use of DSL tools improve the maintainability of language implementations compared to implementations from scratch? We present empirical results on aspects of maintainability of six implementations of the same DSL using different languages (Java, JavaScript, C#) and DSL tools (ANTLR, OMeta, Microsoft "M"). Our evaluation indicates that the...