Article

Search by example in TouchDevelop: Code search made easy

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Code search has always been essential to software development; it is the cornerstone of activities such as program comprehension and maintenance. Traditionally, code search required learning of complex query languages with very steep learning curves.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... com) and Bitbucket (https://bitbucket.org/). It is therefore not surprising that, in the last few years, we have witnessed an increasing interest in code-search techniques (e.g., [1,26,44]). More recently, researchers have moved beyond searching for small snippets of code and have developed sophisticated techniques that allow developers to search for larger fragments of code (e.g., [4,5,37]) and even for graphical user interface (GUI) code [38]. ...
Conference Paper
Full-text available
A typical way to design and develop a mobile app is to sketch the graphical user interfaces (GUIs) for the different screens in the app and then create actual GUIs from these sketches. Doing so involves identifying which layouts to use, which widgets to add, and how to configure and connect the different pieces of the GUI. To help with this difficult and time-consuming task, we propose GUIFetch, a technique that takes as input the sketch for an app and leverages the growing number of open source apps in public repositories to identify apps with GUIs and transitions that are similar to those in the provided sketch. GUIFetch first searches public repositories to find potential apps using keyword matching. It then builds models of the identified apps' screens and screen transitions using a combination of static and dynamic analyses and computes a similarity metric between the models and the provided sketch. Finally, GUIFetch ranks the identified apps (or parts thereof) based on their computed similarity value and produces a visual ranking of the results together with the code of the corresponding apps. We implemented GUIFetch for Android apps and evaluated it through user studies involving different types of apps.
... It uses an internal search engine that understands program structure to find code to test and then presents the result to the user. Other recent code search working on test cases includes (Akhin et al. 2012;Janjic et al. 2009;Lemos et al. 2011) and our S 6 . Test cases and semantics have also been used in a similar fashion for finding web services (Ernst et al. 2006;Reiss 2005), but have the problem that the user must know exactly what is being searched for (Janjic and Atkinson 2012). ...
Article
Full-text available
User interface design and coding can be complex and messy. We describe a system that uses code search to simplify and automate the exploration of such code. We start with a simple sketch of the desired interface along with a set of keywords describing the application context. If necessary, we convert the sketch into a scalable vector graphics diagram. We then use existing code search engines to find results based on the keywords. We look for potential Java-based graphical user interface solutions within those results and apply a series of code transformations to the solutions to generate derivative solutions, aiming to get solutions that constitute only the user interface and that will compile and run. We run the resultant solutions and compare the generated interfaces to the user’s sketches. Finally, we let programmers interact with the matched solutions and return the running code for the solutions they choose. The system is useful for exploring alternative interfaces to the initial and for looking at graphical user interfaces in a code repository.
Article
Code search is an essential task in software development. Developers often search the internet and other code databases for necessary source code snippets to ease the development efforts. Code search techniques also help learn programming as novice programmers or students can quickly retrieve (hopefully good) examples already used in actual software projects. Given the recurrence of the code search activity in software development, there is an increasing interest in the research community. To improve the code search experience, the research community suggests many code search tools and techniques. These tools and techniques leverage several different ideas and claim a better code search performance. However, it is still challenging to illustrate a comprehensive view of the field considering that existing studies generally explore narrow and limited subsets of used components. This study aims to devise a grounded approach to understanding the procedure for code search and build an operational taxonomy capturing the critical facets of code search techniques. Additionally, we investigate evaluation methods, benchmarks, and datasets used in the field of code search.
Article
We demonstrate a tool for browsing large software repositories such as GitHub or Source Forge using all the facilities one normally associates with an integrated development environment. The tool integrates code search engines with the Code Bubbles development environment. It lets the user perform and compare multiple searches, investigate and explore the results that are returned, expand searches as necessary, and eventually export appropriate results.
Article
User interface design and coding can be complex and messy. We describe a system that uses code search to simplify and automate the generation of such code. We start with a simple sketch of the desired interface along with a set of keywords describing the application context. We then use existing code search engines to find results based on the keywords. We look for potential Javabased user interface solutions within those results and apply a series of code transformations to the solutions to generate derivative solutions, aiming to get solutions that constitute only the user interface and that will compile and run. We run the resultant solutions and compare the generated interfaces to the user's sketches. Finally, we let programmers interact with the matched solutions and return the running code for the solutions they choose. The system can be used not only for generating initial user interface code for an application, but also for exploring alternative interfaces and for looking at the user interfaces in a code repository.
Article
Full-text available
Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existing code fragments and using then by pasting with or without minor modifications. One of the major shortcomings of such duplicated fragments is that if a bug is detected in a code fragment, all the other fragments similar to it should be investigated to check the possible existence of the same bug in the similar fragments. Refactoring of the duplicated code is another prime issue in software maintenance although several studies claim that refactoring of certain clones are not desirable and there is a risk of removing them. However, it is also widely agreed that clones should at least be detected. In this paper, we survey the state of the art in clone detection research. First, we describe the clone terms commonly used in the literature along with their corresponding mappings to the commonly used clone types. Second, we provide a review of the existing
Article
Full-text available
A common problem faced by modern mobile-device platforms is that third-party applications in the marketplace may leak private information without notifying users. Existing approaches adopted by these platforms provide little information on what applications will do with the private information, failing to effectively assist users in deciding whether to install applications and in controlling their privacy. To address this problem, we propose a transparent privacy control approach, where an automatic static analysis reveals to the user how private information is used inside an application. This flow information provides users with better insights, enabling them to determine when to use anonymized instead of real information, or to force script termination when scripts access private information. To further reduce the user burden in controlling privacy, our approach provides a default setting based on an extended information flow analysis that tracks whether private information is obscured before escaping through output channels. We built our approach into TouchDevelop, a novel application-creation environment that allows users to write application scripts on mobile devices, share them in a web bazaar, and install scripts published by other users. To evaluate our approach, we plan to study a portion of published scripts in order to evaluate the effectiveness and performance of information flow analysis. We also plan to carry out a user survey to evaluate the usability of our privacy control and guide our future design.
Article
Full-text available
The world is experiencing a technology shift. In 2011, more touchscreen-based mobile devices like smartphones and tablets will be sold than desktops, laptops, and netbooks combined. In fact, in many cases incredibly powerful and easy-to-use smart phones are going to be the first and, in less developed countries, possibly the only computing devices which virtually all people will own, and carry with them at all times. Furthermore, mobile devices do not only have touchscreens, but they are also equipped with a multitude of sensors, such as location information and acceleration, and they are always connected to the cloud. TouchDevelop is a novel application creation environment for anyone to script their smartphones anywhere -- you do not need a separate PC. TouchDevelop allows you to develop mobile device applications that can access your data, your media, your sensors and allows using cloud services including storage, computing, and social networks. TouchDevelop targets students, and hobbyists, not necessarily the professional developer. Typical TouchDevelop applications are written for fun, or for personalizing the phone. TouchDevelop's typed, structured programming language is built around the idea of only using a touchscreen as the input device to author code. It has built-in primitives which make it easy to access the rich sensor data available on a mobile device. In our vision, the state of the program is automatically distributed between mobile clients and the cloud, with automatic synchronization of data and execution between clients and cloud, liberating the programmer from worrying (or even having to know about) the details. We report on our experience with our first prototype implementation for the Windows Phone 7 platform, which already realizes a large portion of our vision. It is available on the Windows Phone Marketplace.
Conference Paper
Full-text available
Although numerous different clone detection approaches have been proposed to date, not a single one is both incremental and scalable to very large code bases. They thus cannot provide real-time cloning information for clone management of very large systems. We present a novel, index-based clone detection algorithm for type 1 and 2 clones that is both incremental and scalable. It enables a new generation of clone management tools that provide real-time cloning information for very large software. We report on several case studies that show both its suitability for real-time clone detection and its scalability: on 42 MLOC of Eclipse code, average time to retrieve all clones for a file was below 1 second; on 100 machines, detection of all clones in 73 MLOC was completed in 36 minutes.
Conference Paper
Full-text available
Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space mathbb{R}^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.
Article
Full-text available
A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.
Conference Paper
In this paper, we introduce a technique for applying independent component analysis to vector space representations of software code fragments such as methods or blocks. The distance between these points can be determined, and used as a measure of the similarity between the original source code fragments they represent. It can be reasoned that if the initial matrix representation contains enough information about the syntactic structure of the source code, the vector space representation will be sufficient to predict the similarity of fragments to one another, and can provide the likelihood that the code is a clone.
Conference Paper
In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.
Conference Paper
Model-based development is becoming an increasingly com- mon development methodology. In important domains like embedded systems already major parts of the code are gener- ated from models specied with domain-specic modelling languages. Hence, such models are nowadays an integral part of the software development and maintenance process and therefore have a major economic and strategic value for the software-developing organisations. Nevertheless almost no work has been done on a quality defect that is known to seriously hamper maintenance productivity in classic code- based development: Cloning. This paper presents an ap- proach for the automatic detection of clones in large mod- els as they are used in model-based development of con- trol systems. The approach is based on graph theory and hence can be applied to most graphical data-ow languages. An industrial case study demonstrates the applicability of our approach for the detection of clones in Matlab/Simulink models that are widely used in model-based development of embedded systems in the automotive domain.
Conference Paper
We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under l0RW1S34RfeSDcfkexd09rT4p1RW1S34RfeSDcfkexd09rT4 norm, based on p-stable distributions. Our scheme improves the running time of the earlier algorithm for the case of the l0RW1S34RfeSDcfkexd09rT421RW1S34RfeSDcfkexd09rT4 norm. It also yields the first known provably efficient approximate NN algorithm for the case p less than or equal 1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain "bounded growth" condition. Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kd-tree.
Conference Paper
Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "re-inventing the wheel". Previous research on the detection of clones is mainly focused on identifying pieces of code with similar (or nearly similar) structure. Our approach is to examine the source code text (comments and identifiers) and identify implementations of similar high-level concepts (e.g., abstract data types). The approach uses an information retrieval technique (i.e., latent semantic indexing) to statically analyze the software system and determine semantic similarities between source code documents (i.e., functions, files, or code segments). These similarity measures are used to drive the clone detection process. The intention of our approach is to enhance and augment existing clone detection methods that are based on structural analysis. This synergistic use of methods will improve the quality of clone detection. A set of experiments is presented that demonstrate the usage of semantic similarity measure to identify clones within a version of NCSA Mosaic.
Conference Paper
We present an approach to identifying similar code in programs based on finding similar subgraphs in attributed directed graphs. This approach is used on program dependence graphs and therefore considers not only the syntactic structure of programs but also the data flow within (as an abstraction of the semantics). As a result, there is no tradeoff between precision and recall; our approach is very good in both. An evaluation of our prototype implementation shows that the approach is feasible and gives very good results despite the non polynomial complexity of the problem
Article
This paper describes a program called dup that finds occurrences of duplicated or related code in large software systems. The motivation is that duplication may be introduced into a large system as modifications are made to add new features or to fix bugs; rather than rewrite working sections of code, programmers may copy and modify sections of code. Over time, proliferation of copies can make the code more complex and more difficult to maintain. Dup searches such code for all pairs of duplicated sections. The user may choose to search either for identical sections of code, or for sections that match except for substitution of one set of variable names and constants for another as if they were corresponding procedure parameters. Applications of dup could include visualization of the structural complexity of the whole system, identifying unusually complex files, identifying sections of code that should be replaced by procedures, and debugging. Introduction. This paper describes a new ...
Touchdevelop: Programming cloud-connected mobile de-vices via touchscreen, " in Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software, ser. ONWARD '11
  • N Tillmann
  • M Moskal
  • J De Halleux
  • M Fä
N. Tillmann, M. Moskal, J. de Halleux, and M. Fä, " Touchdevelop: Programming cloud-connected mobile de-vices via touchscreen, " in Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software, ser. ONWARD '11. New York, NY, USA: ACM, 2011, pp. 49–60.
A survey on software clone detection research School of Computing, Queen's University
  • C K Roy
  • J R Cordy