Conference Paper

ReduKtor: How We Stopped Worrying About Bugs in Kotlin Compiler

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The Kotlin compiler was also the target of researchers. Stepanov, Akhin, and Belyaev [253] focused on creating an automatic input reduction tool for the Kotlin compiler to simplify the debugging process. The approach is based on a combination of Kotlin-specific transformations, program slicing, and Hierarchical Delta Debugging (HDD) [194]. ...
... Three studies focused on improving the Kotlin compiler [34,33,253]. Bryksin et al. [34,33] investigated code anomalies in Kotlin and whether these anomalies could improve the Kotlin compiler. ...
... Bryksin et al. [34,33] investigated code anomalies in Kotlin and whether these anomalies could improve the Kotlin compiler. Stepanov, Akhin, and Belyaev [253] proposed a tool to perform input reduction, which simplifies the bug localization process. Finally, Tankov, Golubev, and Bryksin [258] proposed a framework for the development of web services. ...
Thesis
In recent years, with more than 3 million applications on its official store, Google’s Android has dominated the market of mobile operating systems worldwide. Despite this success, Google has continued evolving its operating system and its toolkits to ease application development. In 2017 Google declared Kotlin as an official Android programming language. More recently, during the Google I/O 2019, Google announced that Android became ‘Kotlin-first’, which means that new API, libraries, and documentation will target Kotlin and eventually Java and Kotlin as preferred language to create new Android applications. Kotlin is a programming language that runs on the Java Virtual Machine (JVM) and it is fully interoperable with Java because both languages are compiled to JVM bytecode. Due to this characteristic, Android developers do not need to migrate their existing applications to Kotlin to start using Kotlin in these applications. Moreover, Kotlin provides a different approach to write applications because it combines object-oriented and functional features. Therefore, we hypothesize that the adoption of Kotlin by developers may affect different aspects of Android applications’ development. However, one year after this first announcement, there were no studies in the literature about Kotlin. In this thesis, we conducted a series of empirical studies to address these lacks and build a better understanding of creating high-quality Android applications using Kotlin. First, we carried a study to measure the degree of adoption of Kotlin. Our results showed that 11% of the studied Android applications had adopted Kotlin. Then, we analyzed how the adoption of Kotlin impacted the quality of Android applications in terms of code smells. We found that the introduction of Kotlin in Android applications initially written in Java produces a rise in the quality scores from 50% to 80% according to the code smell considered. We analyzed the evolution of usage of features introduced by Kotlin, such as Smart cast, and how the amount of Kotlin code changes over applications’ evolution. We found that the number of instances of features tends to grow throughout applications’ evolution. Finally, we focused on the migration of Android applications from Java to Kotlin. We found that 25% of the open source applications that were initially written in Java have entirely migrated to Kotlin, and for 19%, the migration was done gradually, throughout several versions, thanks to the interoperability between Java and Kotlin. This migration activity is challenging because: a) each migrated piece of code must be exhaustively tested after the migration to ensure it preserves the expected behavior; b) a project can be large, composed of several candidate files to be migrated. In this thesis, we present an approach to support migration, which suggests, given a version of an application written in Java and eventually, in Kotlin, the most convenient files to migrate. We evaluated our approach’s feasibility by applying two different machine learning techniques: classification and learning-to-rank. Our results showed that both techniques modestly outperform random approaches. Nevertheless, our approach is the first that proposes the use of machine learning to recommend file-level migrations. Therefore, our results define a baseline for future work. Since the migration from Java to Kotlin may positively impact the application’s maintenance and that migration is time-consuming and challenging, developers may use our approach to select the files to be migrated first. Finally, we discuss several research perspectives opened by our results that can improve the experience of creating high-quality Android applications using Kotlin.
... Input reduction is implemented in order to make a small and understandable test from a possibly large and complex input leading to a compiler error. The approach we have implemented, called Reduktor, is described in details in [12]. On a very high level, it is based on a hybrid approach which combines several language-agnostic and language-specific techniques. ...
... To further enhance mutation-based approaches, we decided to augment language-agnostic mutations with language-specific ones. These mutations consist of over 20 text-and tree-based transformation, which are based on our previous experience on Kotlin fuzzing and reduction [12]. ...
Preprint
Full-text available
Kotlin is a relatively new programming language from JetBrains: its development started in 2010 with release 1.0 done in early 2016. The Kotlin compiler, while slowly and steadily becoming more and more mature, still crashes from time to time on the more tricky input programs, not least because of the complexity of its features and their interactions. This makes it a great target for fuzzing, even the basic forms of which can find a significant number of Kotlin compiler crashes. There is a problem with fuzzing, however, closely related to the cause of the crashes: generating a random, non-trivial and semantically valid Kotlin program is hard. In this paper, we talk about type-centric compiler fuzzing in the form of type-centric enumeration, an approach inspired by skeletal program enumeration and based on a combination of generative and mutation-based fuzzing, which solves this problem by focusing on program types. After creating the skeleton program, we fill the typed holes with fragments of suitable type, created via generation and enhanced by semantic-aware mutation. We implemented this approach in our Kotlin compiler fuzzing framework called Backend Bug Finder (BBF) and did an extensive evaluation, not only testing the real-world feasibility of our approach, but also comparing it to other compiler fuzzing techniques. The results show our approach to be significantly better compared to other fuzzing approaches at generating semantically valid Kotlin programs, while creating more interesting crash-inducing inputs at the same time. We managed to find more than 50 previously unknown compiler crashes, of which 18 were considered important after their triage by the compiler team.
... More details about the Kotlin program reducer and its results can be found in the article [20]. In general, a complex approach using language-agnostic methods and language-specific transformations gives a good result. ...
Article
Introduction: The standard way to check the quality of a compiler is manual testing. However, it does not allow to cover a vast diversity of programs that can be written in a target programming language. Today, in addition to manual written tests there are many automated compiler testing methods, among which fuzzing is one of the most powerful and useful. A compiler fuzzer is a tool that generates a random program in a target language and checks how the compiler works in this language. Purpose: To develop a platform for compiler fuzzing and, based on it, to develop a tool for Kotlin compiler testing. Results: We have developed Backend Bug Finder which is a platform for compiler fuzzing is. We have chosen a mutation-based approach as a method for generating random programs. First, an existing program is entered to the mutator as the input to be then transformed in some way. Mutations can be both trivial, for example, replacing arithmetic operators with others, and complex, changing the structure of the program. Next, the resulting program is fed to the input of the compiler with the following check of its operation. The developed test oracle can detect three types of errors: crashes, miscompilations, and performance degradations. If an error is detected, the test case is fed into the post-processing module, where reduction and deduplication algorithms are applied. We have developed a tool for fuzzing the Kotlin language compiler based on the platform for its approbation, which showed the applicability of the proposed approach for finding errors in modern compilers. Practical relevance: Over a year and a half of work, our tool has found thousands of different Kotlin compiler bugs, more than 200 of which were sent to the developers, and more than 80 have been fixed.
... Kotlin has more modern language features than Java. Kotlin is a pragmatic programming language that combines object-oriented (OO) and functional programming [12], [13]. Kotlin has attracted many developers because of its simple syntax and its main focus on mobile development in the beginning. ...
Article
Full-text available
This paper discusses the issue of comparing Java and Kotlin technologies based on the web application framework. The criteria taken into account for testing purposes are: execution time, memory usage, CPU load, database response in set time. A series of tests and their in-depth comparative analysis are carried out. For this case, tests and code analysis were carried out to draw comparative conclusions. The performance in terms of web frameworks, database response speed and tests implementation in different languages - in all these Kotlin proved to be less efficient. There is no significant difference between CPU load between individual easurements, the difference does not exceed 2%. Implementation in the Kotlin language has never achieved the best result in any group of measurements.
Chapter
Full-text available
Test cases play an important role in testing and debugging software. Smaller tests are easier to understand and use for these tasks. Given a test that demonstrates a bug, test case reduction finds a smaller variant of the test case that exhibits the same bug. Classically, one of the challenges for test case reduction is that the process is slow, often taking hours. For hierarchically structured inputs like source code, the state of the art is Perses, a recent grammar aware and queue driven approach for test case reduction. Perses traverses nodes in the abstract syntax tree (AST) of a program (test case) based on a priority order and tries to reduce them while preserving syntactic validity.
Conference Paper
Full-text available
Test case reduction has been automated since the introduction of the minimizing Delta Debugging algorithm, but improving the efficiency of reduction is still the focus of research. This paper focuses on Hierarchical Delta Debugging, already an improvement over the original technique, and describes how its input tree and caching approach can be changed for higher efficiency. The proposed optimizations were evaluated on artificial and real test cases of 6 different input formats, and achieved an average 45% drop in the number of testing steps needed to reach the minimized results-with the best improvement being as high as 82%, giving a more than 5-fold speedup.
Conference Paper
Full-text available
Current slicing techniques cannot handle systems written in multiple programming languages. Observation-Based Slicing (ORBS) is a language-independent slicing technique capable of slicing multi-language systems, including systems which contain (third party) binary components. A potential slice obtained through repeated statement deletion is validated by observing the behaviour of the program: if the slice and original program behave the same under the slicing criterion, the deletion is accepted. The resulting slice is similar to a dynamic slice. We evaluate five variants of ORBS on ten programs of different sizes and languages showing that it is less expensive than similar existing techniques. We also evaluate it on bash and four other systems to demonstrate feasible large-scale operation in which a parallelised ORBS needs up to 82% less time when using four threads. The results show that an ORBS slicer is simple to construct, effective at slicing, and able to handle systems written in multiple languages without specialist analysis tools.
Article
Full-text available
What is a test case for? Sometimes, to expose a fault. Tests can also exercise code, use memory or time, or produce desired output. Given a desired effect, a test case can be seen as a cause, and its components divided into essential (required for effect) and accidental. Delta debugging is used for removing accidents from failing test cases, producing smaller test cases that are easier to understand. This paper extends delta debugging by simplifying test cases with respect to arbitrary effects, a generalization called cause reduction. Suites produced by cause reduction provide effective quick tests for real-world programs. For Mozilla's JavaScript engine, the reduced suite is possibly more effective for finding faults. The effectiveness of reduction-based suites persists through changes to the software, improving coverage by over 500 branches for versions up to 4 months later. Cause reduction has other applications, including improving seeded symbolic execution, where using reduced tests can often double the number of additional branches explored. Copyright © 2015 John Wiley & Sons, Ltd.
Article
Full-text available
To report a compiler bug, one must often find a small test case that triggers the bug. The existing approach to automated test-case reduction, delta debugging, works by removing substrings of the original input; the result is a concatenation of substrings that delta cannot remove. We have found this approach less than ideal for reducing C programs because it typically yields test cases that are too large or even invalid (relying on undefined behavior). To obtain small and valid test cases consistently, we designed and implemented three new, domain-specific test-case reducers. The best of these is based on a novel framework in which a generic fixpoint computation invokes modular transformations that perform reduction operations. This reducer produces outputs that are, on average, more than 25 times smaller than those produced by our other reducers or by the existing reducer that is most commonly used by compiler developers. We conclude that effective program reduction requires more than straightforward delta debugging.
Conference Paper
Full-text available
This tool paper describes a modular program slicer for Java built using the Indus program analysis framework along with it’s Eclipse-based user interface called Kaveri. Indus provides a library of classes that enables users to quickly assemble a highly customized non-system dependence graph based inter-procedural program slicer capable of slicing concurrent Java programs. Kaveri is an Eclipse plugin that relies on the above library to deliver program slicing to the eclipse platform. Apart from the basic feature for generating program slices from within eclipse along with an intuitive UI to view the slice, the plugin also provides the capability for chasing various dependences in the application to understand the slice.
Conference Paper
Full-text available
Randomized unit test cases can be very effective in detecting de- fects. In practice, however, failing test cases often comprise long sequences of method calls that are tiresome to reproduce and de- bug. We present a combination of static slicing and delta debug- ging that automatically minimizes the sequence of failure-inducing method calls. In a case study on the EiffelBase library, the strategy minimizes failing unit test cases on average by 96%. This approach improves on the state of the art by being far more efficient: in contrast to the approach of Lei and Andrews, who use delta debugging alone, our case study found slicing to be 50◊ faster, while providing comparable results. The combination of slicing and delta debugging gives the best results and is 11◊ faster. Categories and Subject Descriptors: D.2.5 (Software Engineer- ing) Testing and Debugging-Testing tools (e.g., data generators, coverage testing)
Conference Paper
Given a program P that exhibits a certain property Ψ (e.g., a C program that crashes GCC when it is being compiled), the goal of program reduction is to minimize P to a smaller variant P′ that still exhibits the same property, i.e., Ψ(P′). Program reduction is important and widely demanded for testing and debugging. For example, all compiler/interpreter development projects need effective program reduction to minimize failure-inducing test programs to ease debugging. However, state-of-the-art program reduction techniques --- notably Delta Debugging (DD), Hierarchical Delta Debugging (HDD), and C-Reduce --- do not perform well in terms of speed (reduction time) and quality (size of reduced programs), or are highly customized for certain languages and thus lack generality. This paper presents Perses, a novel framework for effective, efficient, and general program reduction. The key insight is to exploit, in a general manner, the formal syntax of the programs under reduction and ensure that each reduction step considers only smaller, syntactically valid variants to avoid futile efforts on syntactically invalid variants. Our framework supports not only deletion (as for DD and HDD), but also general, effective program transformations. We have designed and implemented Perses, and evaluated it for two language settings: C and Java. Our evaluation results on 20 C programs triggering bugs in GCC and Clang demonstrate Perses's strong practicality compared to the state-of-the-art: (1) smaller size --- Perses's results are respectively 2% and 45% in size of those from DD and HDD; and (2) shorter reduction time --- Perses takes 23% and 47% time taken by DD and HDD respectively. Even when compared to the highly customized and optimized C-Reduce for C/C++, Perses takes only 38-60% reduction time.
Conference Paper
Program slicing is a technique of abstracting from a program depending on slicing criteria to be used in maintenance, debugging and other applications. Java programs as any other programming language has been a target for slicing tool, such as Kaveri and WALA. In this paper, we present a new Java backward slicing tool; JavaBST. JavaBST, proposed a new way of reading java code and produce the dependencies depending on the syntax to produce a backward slice for each variable in the program. It succeeded in terms of producing slices. But, still simple tool to deal with complicated programs.
Article
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.
Conference Paper
Book
Programmers run into parsing problems all the time. Whether it's a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro language--ANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features. Build your own languages with ANTLR v4, using ANTLR's new advanced parsing technology. In this book, you'll learn how ANTLR automatically builds a data structure representing the input (parse tree) and generates code that can walk the tree (visitor). You can use that combination to implement data readers, language interpreters, and translators. You'll start by learning how to identify grammar patterns in language reference manuals and then slowly start building increasingly complex grammars. Next, you'll build applications based upon those grammars by walking the automatically generated parse trees. Then you'll tackle some nasty language problems by parsing files containing more than one language (such as XML, Java, and Javadoc). You'll also see how to take absolute control over parsing by embedding Java actions into the grammar. You'll learn directly from well-known parsing expert Terence Parr, the ANTLR creator and project lead. You'll master ANTLR grammar construction and learn how to build language tools using the built-in parse tree visitor mechanism. The book teaches using real-world examples and shows you how to use ANTLR to build such things as a data file reader, a JSON to XML translator, an R parser, and a Java class-interface extractor. This book is your ticket to becoming a parsing guru!What You Need: ANTLR 4.0 and above. Java development tools. Ant build system optional (needed for building ANTLR from source)
Article
When a component in a large system fails, developers encounter two problems: (1) reproducing the failure, and (2) investigating the causes of such a failure. Our JINSI tool lets developers capture and replay the interactions between a component and its environment, thus allowing for reproducing the failure at will. In addition, JINSI uses delta debugging to automatically isolate the subset of the interactions that is relevant for the failure. In a first study, JINSI has successfully isolated the relevant interaction of a JAVA component: "Out of the 32 interactions with the (BOB - wasn't sure about this one)component, seven interactions suffice to produce the failure.
Article
This paper introduces program chipping, a simple yet effective technique to isolate bugs. This technique automatically removes or chips away parts of a program so that the part that contributes to some symptomatic output becomes more apparent to the user. Program chipping is similar in spirit to traditional program slicing and debugging techniques, but chipping uses very simple techniques based on the syntactic structure of the program. We have developed a chipping tool for Java programs, called ChipperJ, and have run it on a variety of small to large programs, including a Java compiler, looking for various symptoms. The results are promising. The reduced program is generally about 20–35% of the size of the original. ChipperJ takes less than an hour on large programs to perform this reduction; even if it took overnight, that would be reasonable if it saves the developer time. Copyright © 2006 John Wiley & Sons, Ltd.
Article
The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D 2) time variation.
Conference Paper
Inputs causing a program to fail are usually large and often con- tain information irrelevant to the failure. It thus helps debugging to simplify program inputs. The Delta Debugging algorithm is a general technique applicable to minimizing all failure-inducing in- puts for more effective debugging. In this paper, we present HDD, a simple but effective algorithm that signi cantly speeds up Delta Debugging and increases its output quality on tree structured inputs such as XML. Instead of treating the inputs as one at atomic list, we apply Delta Debugging to the very structure of the data. In par- ticular, we apply the original Delta Debugging algorithm to each level of a program's input, working from the coarsest to the nest levels. We are thus able to prune the large irrelevant portions of the input early. All the generated input congu rations are syntacti- cally valid, reducing the number of inconclusive con gurations that need to be tested and accordingly the amount of time spent simpli- fying. We have implemented HDD and evaluated it on a number of real failure-inducing inputs from the GCC and Mozilla bugzilla databases. Our Hierarchical Delta Debugging algorithm produces simpler outputs and takes orders of magnitude fewer test cases than the original Delta Debugging algorithm. It is able to scale to inputs of considerable size that the original Delta Debugging algorithm cannot process in practice. We argue that HDD is an effective tool for automatic debugging of programs expecting structured inputs. Categories and Subject Descriptors: D.2.5(Software Engineer- ing): Testing and DebuggingóDebugging aids, Testing tools.
Conference Paper
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.
Article
Program slicing is a method for automatically decomposing programs by analyzing their data flow and control flow. Starting from a subset of a program's behavior, slicing reduces that program to a minimal form which still produces that behavior. The reduced program, called a 'slice', is an independent program guaranteed to represent faithfully the original program within the domain of the specified subset of behavior. Some properties of slices are presented. In particular, finding statement-minimal slices is in general unsolvable, but using data flow analysis is sufficient to find approximate slices. Potential applications include automatic slicing tools for debugging and parallel processing of slices.
Conference Paper
Software debugging is the process of locating and correcting faulty code. Prior techniques to locate faulty code either use program analysis techniques such as backward dynamic program slicing or exclusively use delta debugging to analyze the state changes during program execution. In this paper, we present a new approach that integrates the potential of delta debugging algorithm with the ben- efit of forward and backward dynamic program slicing to narro w down the search for faulty code. Our approach is to use delta de- bugging algorithm to identify a minimal failure-inducing input, use this input to compute a forward dynamic slice and then intersect the statements in this forward dynamic slice with the statements in the backward dynamic slice of the erroneous output to compute a failure-inducing chop. We implemented our technique and con- ducted experiments with faulty versions of several programs from the Siemens suite to evaluate our technique. Our experiments show that failure-inducing chops can greatly reduce the size of s earch space compared to the dynamic slices without significantly c om- promising the capability to locate the faulty code. We also ap- plied our technique to several programs with known memory re- lated bugs such as buffer overflow bugs. The failure-inducin g chop in several of these cases contained only 2 to 4 statements which included the code causing memory corruption.
Conference Paper
We describe a framework for randomized unit testing, and give empirical evidence that generating unit test cases randomly and then minimizing the failing test cases results in significant benefits. Randomized generation of unit test cases (sequences of method calls) has been shown to allow high coverage and to be highly effective. However, failing test cases, if found, are often very long sequences of method calls. We show that Zeller and Hildebrandt's test case minimization algorithm significantly reduces the length of these sequences. We study the resulting benefits qualitatively and quantitatively, via a case study on found open-source data structures and an experiment on lab-built data structures
Conference Paper
CodeSurfer is a powerful source code analysis and navigation tool for a range of languages, including C/C++ and x86 machine code. The Path Inspector is an add-on to CodeSurfer that allows a user to reason about paths through the program, and which can be used to find programming flaws.
Conference Paper
Dynamic slicing is a well-known program debugging technique. Given a program P and input I, it finds all program statements which directly/indirectly affect the values of some variables' occurrences when P is executed with I. Dynamic slicing algorithms often proceed by traversing the execution trace of P produced by input I (or a dependence graph which captures control/data flow in the execution trace). Consequently, it is important to develop space efficient representations of the execution trace. In this paper, we use results from data compression to compactly represent bytecode traces of sequential Java programs. The major space savings come from the optimized representation of data (instruction) addresses used by memory reference (branch) bytecodes as operands. We give detailed experimental results on the space efficiency and time overheads for our compact trace representation. We then show how dynamic slicing algorithms can directly traverse our compact traces without resorting to costly decompression. We also develop an extension of dynamic slicing which allows us to explain omission errors (i.e. why some events did not happen during program execution).
Article
Given some test case, a program fails. Which circumstances of the test case are responsible for the particular failure? The delta debugging algorithm generalizes and simplifies the failing test case to a minimal test case that still produces the failure. It also isolates the difference between a passing and a failing test case. In a case study, the Mozilla Web browser crashed after 95 user actions. Our prototype implementation automatically simplified the input to three relevant user actions. Likewise, it simplified 896 lines of HTML to the single line that caused the failure. The case study required 139 automated test runs or 35 minutes on a 500 MHz PC
Delta debugging implementation
  • S Mcpeak
  • D S Wilkerson
Slicing: Program slices and their applications
  • V Kasyanov
  • Mirzuitova
JavaBST: Java backward slicing tool
  • M Abdallah
  • M Alokush
  • M Alrefaee
  • R Salah
  • K Bader
  • Awad
Locating faulty code using failure-inducing chops
  • N Gupta
  • X He
  • R Zhang
  • Gupta
Perses: Syntax-guided program reduction
  • C Sun
  • Li
  • T Zhang
  • Z Gu
  • Su
Testcase reduction for C compiler bugs
  • regehr