Conference Paper

Meta: Enabling Programming Languages to Learn from the Crowd

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Collectively authored programming resources such as Q&A sites and open-source libraries provide a limited window into how programs are constructed, debugged, and run. To address these limitations, we introduce Meta: a language extension for Python that allows programmers to share functions and track how they are used by a crowd of other programmers. Meta functions are shareable via URL and instrumented to record runtime data. Combining thousands of Meta functions with their collective runtime data, we demonstrate tools including an optimizer that replaces your function with a more efficient version written by someone else, an auto-patcher that saves your program from crashing by finding equivalent functions in the community, and a proactive linter that warns you when a function fails elsewhere in the community. We find that professional programmers are able to use Meta for complex tasks (creating new Meta functions that, for example, cross-validate a logistic regression), and that Meta is able to find 44 optimizations (for a 1.45 times average speedup) and 5 bug fixes across the crowd.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In learning too, social computing is also helping learners diagnose errors [20], and learn programming [2]. One interesting application of social computing has the potential to help learners improve their code writing skills by sharing their codes with their peers and observing how they (peers) improve it [4]. An alternative is to receive continuous code reviews from a learner's peer [3] to help improve coding style and proficiency. ...
... Recent efforts to leverage crowd as a system development platform [9] is a step forward. OpenSchool adopts crowd programming as a social computing and learning platform, provides tools to track, sanitize and extract shared knowledge as usable programming artifacts, and document the evolution in ways similar to Meta [4]. OpenSchool is able to detect code similarity in ways similar to code clone [15] or plagiarism detection [10] systems. ...
... To achieve goals of this nature, flash teams decompose the work into a sequence of linked tasks and handoffs, which are completed by diverse experts from the crowd who are computationally managed. Other crowd work systems have hired experts for complex goals, such as writing [48], software development [15,24], mentorship and skill development [59] and physical world tasks [61]. Similar to microtask crowdsourcing systems, however, most of these expert crowdsourcing systems rely on workflows of decomposed tasks. ...
... While Upwork and expert design tasks are currently the state-of-the-art in crowdsourcing and crowd research [15,24,48,52,59,61], it is possible that the effects will differ with alternative goals or workflows. In particular, it is possible that these results do not hold with single-expertise teams such as with only software engineers, or that these results are particular to the combination of expertise types that we recruited. ...
Article
The dominant crowdsourcing infrastructure today is the workflow, which decomposes goals into small independent tasks. However, complex goals such as design and engineering have remained stubbornly difficult to achieve with crowdsourcing workflows. Is this due to a lack of imagination, or a more fundamental limit? This paper explores this question through in-depth case studies of 22 workers across six workflow-based crowd teams, each pursuing a complex and interdependent web development goal. We used an inductive mixed method approach to analyze behavior trace data, chat logs, survey responses and work artifacts to understand how workers enacted and adapted the crowdsourcing workflows. Our results indicate that workflows served as useful coordination artifacts, but in many cases critically inhibited crowd workers from pursuing real-time adaptations to their work plans. However, the CSCW and organizational behavior literature argues that all sufficiently complex goals require open-ended adaptation. If complex work requires adaptation but traditional static crowdsourcing workflows can't support it, our results suggest that complex work may remain a fundamental limitation of workflow-based crowdsourcing infrastructures.
... As another example, an educator running an introductory programming class could invite students to submit their implementations of a basic algorithm to a joint hosting repository, such that they could share in the code review process and learn from the implementations of others. Similarly, a Python language extension for sharing simple functions [7] could use the functionality of the development environment to share functions, rather than requiring the manual addition of function decorators. ...
Preprint
Full-text available
Developers in data science and other domains frequently use computational notebooks to create exploratory analyses and prototype models. However, they often struggle to incorporate existing software engineering tooling into these notebook-based workflows, leading to fragile development processes. We introduce Assembl\'{e}, a new development environment for collaborative data science projects, in which promising code fragments of data science pipelines can be contributed as pull requests to an upstream repository entirely from within JupyterLab, abstracting away low-level version control tool usage. We describe the design and implementation of Assembl\'{e} and report on a user study of 23 data scientists.
... They built on this work to develop Blueprint, a web search interface integrated into a development environment to support searching for relevant code examples from forums, blogs, and tutorials [5]. Fast and Bernstein designed Meta, a Python language extension that allows programmers to share and compare their implementations of utility functions [19]. These approaches support developers finding and reusing code snippets (i.e., foraging), but are not explicitly designed to support learning the underlying programming concepts and design patterns. ...
Article
Online resources can help novice developers learn basic programming skills, but few resources support progressing from writing working code to learning professional web development practices. We address this gap by advancing Readily Available Learning Experiences, a conceptual approach for transforming all professional web applications into opportunities for authentic learning. This article presents Isopleth, a web-based platform that helps learners make sense of complex code constructs and hidden asynchronous relationships in professional web code. Isopleth embeds sensemaking scaffolds informed by the learning sciences to (1) expose hidden functional and event-driven relationships, (2) surface functionally related slices of code, and (3) support learners manipulating the provided code representations. To expose event-driven relationships, Isopleth implements a novel technique called Serialized Deanonymization to determine and visualize asynchronous functional relationships. To evaluate Isopleth, we conducted a case study across 12 professional websites and a user study with 14 junior and senior developers. Results show that Isopleth’s sensemaking scaffolds helped to surface implementation approaches in event binding, web application design, and complex interactive features across a range of complex professional web applications. Moreover, Isopleth helped junior developers improve the accuracy of their conceptual models of how features are implemented by 31% on average.
... Query-feature graphs provided an early foundation for these methods, connecting user requests with system commands in an image editing application [12]. Others have since extended this approach: for example, using word embedding models to connect user vocabulary with system keywords [1,7] or programming syntax [38], and statistical language models to predict functions a user wants to call [11,28,8]. All of these systems solve the vocabulary problem [10] through statistical models that map natural language to the domain language of a system. ...
Conference Paper
Full-text available
Today, most conversational agents are limited to simple tasks supported by standalone commands, such as getting directions or scheduling an appointment. To support more complex tasks, agents must be able to generalize from and combine the commands they already understand. This paper presents a new approach to designing conversational agents inspired by linguistic theory, where agents can execute complex requests interactively by combining commands through nested conversations. We demonstrate this approach in Iris, an agent that can perform open-ended data science tasks such as lexical analysis and predictive modeling. To power Iris, we have created a domain-specific language that transforms Python functions into combinable automata and regulates their combinations through a type system. Running a user study to examine the strengths and limitations of our approach, we find that data scientists completed a modeling task 2.6 times faster with Iris than with Jupyter Notebook.
... Query-feature graphs set the foundation for these methods, connecting user requests with system commands in an image editing application [10]. Others have since extended this approach: for example, using word embedding models to connect user vocabulary with system keywords [1,5] or programming syntax [38], and statistical language models to predict functions a user wants to call [9,26,6]. All of these systems solve the vocabulary problem [10] through statistical models that map natural language to the domain language of a system. ...
Article
Today's conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to "plot a histogram" with another to first "log-transform the data". To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today's agents, but empowers users to weave together these commands to accomplish complex goals.
... Recent work instead sought to achieve complex goals by moving from microtask workers to expert workers. Such systems now support user interface prototyping [70], questionanswering and debugging for software engineers [11,22,50], worker management [28,45], remote writing tasks [61], and skill training [77]. For example, flash teams demonstrated that expert workflows can achieve far more complex goals than can be accomplished using microtask workflows [70]. ...
Conference Paper
This paper introduces flash organizations: crowds structured like organizations to achieve complex and open-ended goals. Microtask workflows, the dominant crowdsourcing structures today, only enable goals that are so simple and modular that their path can be entirely pre-defined. We present a system that organizes crowd workers into computationally-represented structures inspired by those used in organizations - roles, teams, and hierarchies - which support emergent and adaptive coordination toward open-ended goals. Our system introduces two technical contributions: 1) encoding the crowd's division of labor into de-individualized roles, much as movie crews or disaster response teams use roles to support coordination between on-demand workers who have not worked together before; and 2) reconfiguring these structures through a model inspired by version control, enabling continuous adaptation of the work and the division of labor. We report a deployment in which flash organizations successfully carried out open-ended and complex goals previously out of reach for crowdsourcing, including product design, software development, and game production. This research demonstrates digitally networked organizations that flexibly assemble and reassemble themselves from a globally distributed online workforce to accomplish complex work.
... Expert crowdsourcing enables access to a much broader set of workers, for example designers and programmers. The same ideas can then be applied to expert "macro-tasks" [32,57], enabling the crowdsourcing of goals such as user-centered design [121], programming [91,45,30], and mentorship [142]. However, there remains the open question of how complex the work outcomes from expert crowds can be. ...
Conference Paper
The internet is empowering the rise of crowd work, gig work, and other forms of on-demand labor. A large and growing body of scholarship has attempted to predict the socio-technical outcomes of this shift, especially addressing three questions: 1) What are the complexity limits of on-demand work?, 2) How far can work be decomposed into smaller microtasks?, and 3) What will work and the place of work look like for workers? In this paper, we look to the historical scholarship on piecework — a similar trend of work decomposition, distribution, and payment that was popular at the turn of the 20th century — to understand how these questions might play out with modern on-demand work. We identify the mechanisms that enabled and limited piecework historically, and identify whether on-demand work faces the same pitfalls or might differentiate itself. This approach introduces theoretical grounding that can help address some of the most persistent questions in crowd work, and suggests design interventions that learn from history rather than repeat it.
... Recent work instead sought to achieve complex goals by moving from microtask workers to expert workers. Such systems now support user interface prototyping [70], questionanswering and debugging for software engineers [11,22,50], worker management [28,45], remote writing tasks [61], and skill training [77]. For example, flash teams demonstrated that expert workflows can achieve far more complex goals than can be accomplished using microtask workflows [70]. ...
Conference Paper
While instrumental ensemble playing can benefit children's music education and collaboration skill development, it requires extensive training on music and instruments, which many school children lack. To help children with limited music training experience instrumental ensemble playing, we created EnseWing, an interactive system that offers such an experience. In this paper, we report the design of the EnseWing experience and a two-month field study. Our results show that EnseWing preserves the music and ensemble skills from traditional instrumental ensemble and provides more collaboration opportunities for children.
Conference Paper
Full-text available
In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification. We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness. We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.
Conference Paper
Full-text available
Users often make continued and sustained use of online resources to complement use of a desktop application. For example, users may reference online tutorials to recall how to perform a particular task. While often used in a coordinated fashion, the browser and desktop application provide separate, independent mechanisms for helping users find and refind task-relevant information. In this paper, we describe InterTwine, a system that links information in the web browser with relevant elements in the desktop application to create interapplication information scent. This explicit link produces a shared interapplication history to assist in re-finding information in both applications. As an example, InterTwine marks all menu items in the desktop application that are currently mentioned in the front-most web page. This paper introduces the notion of interapplication information scent, demonstrates the concept in InterTwine, and describes results from a formative study suggesting the utility of the concept.
Conference Paper
Full-text available
We introduce flash teams, a framework for dynamically as-sembling and managing paid experts from the crowd. Flash teams advance a vision of expert crowd work that accom-plishes complex, interdependent goals such as engineering and design. These teams consist of sequences of linked mod-ular tasks and handoffs that can be computationally managed. Interactive systems reason about and manipulate these teams' structures: for example, flash teams can be recombined to form larger organizations and authored automatically in re-sponse to a user's request. Flash teams can also hire more people elastically in reaction to task needs, and pipeline in-termediate output to accelerate completion times. To enable flash teams, we present Foundry, an end-user authoring plat-form and runtime manager. Foundry allows users to author modular tasks, then manages teams through handoffs of inter-mediate work. We demonstrate that Foundry and flash teams enable crowdsourcing of a broad class of goals including de-sign prototyping, course development, and film animation, in half the work time of traditional self-managed teams.
Conference Paper
Full-text available
Although privacy is broadly recognized as a dominant concern for the development of novel interactive technologies, our ability to reason analytically about privacy in real settings is limited. A lack of conceptual interpretive frameworks makes it difficult to unpack interrelated privacy issues in settings where information technology is also present. Building on theory developed by social psychologist Irwin Altman, we outline a model of privacy as a dynamic, dialectic process. We discuss three tensions that govern interpersonal privacy management in everyday life, and use these to explore select technology case studies drawn from the research literature. These suggest new ways for thinking about privacy in socio-technical environments as a practical matter.
Conference Paper
Full-text available
Software engineers often use Q&A forums like Stack Overflow and MSDN to ask and answer technical questions. Through a survey study and web browser log analysis, we find that both askers and answerers of technical forum questions typically conduct extensive online research before composing their posts. The inclusion of links to these research materials is beneficial to the forum participants, though post authors do not always include such citations. Based on these findings, we developed CiteHistory, a browser plugin that simplifies the process of including relevant search queries and URLs as bibliographic supplements to forum posts, and supports information re-finding for post authors. We discuss the results of a two-week deployment of CiteHistory with professional software engineers, which demonstrated that CiteHistory increased reference inclusion in posts, and offered auxiliary benefits as a personal research tracker.
Article
Full-text available
We present a direction of research on principles and methods that can enable general problem solving via human compu-tation systems. A key challenge in human computation is the effective and efficient coordination of problem solving. While simple tasks may be easy to partition across individ-uals, more complex tasks highlight challenges and oppor-tunities for more sophisticated coordination and optimiza-tion, leveraging such core notions as problem decomposi-tion, subproblem routing and solution, and the recomposi-tion of solved subproblems into solutions. We discuss the in-terplay between algorithmic paradigms and human abilities, and illustrate through examples how members of a crowd can play diverse roles in an organized problem-solving pro-cess, serving not only as 'data oracles' at the endpoints of computation, but also as modules for decomposing prob-lems, controlling the algorithmic progression, and perform-ing human program synthesis.
Conference Paper
Full-text available
Debugging is still among the most common and costly of programming activities. One reason is that current debugging tools do not directly support the inquisitive nature of the activity. Interrogative Debugging is a new debugging paradigm in which programmers can ask why did and even why didn't questions directly about their program's runtime failures. The Whyline is a prototype Interrogative Debugging interface for the Alice programming environment that visualizes answers in terms of runtime events directly relevant to a programmer's question. Comparisons of identical debugging scenarios from user tests with and without the Whyline showed that the Whyline reduced debugging time by nearly a factor of 8, and helped programmers complete 40% more tasks.
Conference Paper
Full-text available
The ready availability of online source code examples has changed the cost structure of programming by example modi- fication. However, current search tools are wholly separate from editing tools. What benefits might be realized by inte- grating them? This paper describes the design, implementa- tion, and evaluation of Blueprint, a tool that integrates Web search into the Adobe Flex Builder development environment. Blueprint automatically augments queries with code context, presents an example-centric view of search results, and retains a link between copied code and its source. This paper intro- duces a technique for retrieving relevant example code, de- scriptions, and running examples for a user's query. A be- tween-subjects study found that Blueprint enables participants to search for and select example code significantly faster than with a standard Web browser.
Conference Paper
Full-text available
This paper investigates the role of online resources in prob- lem solving. We look specifically at how programmers—an exemplar form of knowledge workers—opportunistically in- terleave Web foraging, learning, and writing code. We de- scribe two studies of how programmers use online resources. The first, conducted in the lab, observed participants' Web use while building an online chat room. We found that pro- grammers leverage online resources with a range of inten- tions: They engage in just-in-time learning of new skills and approaches, clarify and extend their existing knowledge, and remind themselves of details deemed not worth remember- ing. The results also suggest that queries for different pur- poses have different styles and durations. Do programmers' queries "in the wild" have the same range of intentions, or is this result an artifact of the particular lab setting? We an- alyzed a month of queries to an online programming portal, examining the lexical structure, refinements made, and re- sult pages visited. Here we also saw traits that suggest the Web is being used for learning and reminding. These results contribute to a theory of online resource usage in program- ming, and suggest opportunities for tools to facilitate online knowledge work. Author Keywords
Conference Paper
Full-text available
This paper analyzes a Question & Answer site for programmers, Stack Overflow, that dramatically improves on the utility and performance of Q&A systems for technical domains. Over 92% of Stack Overflow questions about expert topics are answered - in a median time of 11 minutes. Using a mixed methods approach that combines statistical data analysis with user interviews, we seek to understand this success. We argue that it is not primarily due to an a priori superior technical design, but also to the high visibility and daily involvement of the design team within the community they serve. This model of continued community leadership presents challenges to both CSCW systems research as well as to attempts to apply the Stack Overflow model to other specialized knowledge domains.
Conference Paper
Full-text available
This paper introduces query-feature graphs, or QF-graphs. QF-graphs encode associations between high-level descriptions of user goals (articulated as natural language search queries) and the specific features of an interactive system relevant to achieving those goals. For example, a QF-graph for the GIMP graphics manipulation software links the query "GIMP black and white" to the commands "desaturate" and "grayscale." We demonstrate how QF-graphs can be constructed using search query logs, search engine results, web page content, and localization data from interactive systems. An analysis of QF-graphs shows that the associations produced by our approach exhibit levels of accuracy that make them eminently usable in a range of real-world applications. Finally, we present three hypothetical user interface mechanisms that illustrate the potential of QF-graphs: search-driven interaction, dynamic tooltips, and app-to-app analogy search.
Conference Paper
Full-text available
We present Inky, a command line for shortcut access to common web tasks. Inky aims to capture the efficiency benefits of typed commands while mitigating their usability problems. Inky commands have little or no new syntax to learn, and the system displays rich visual feedback while the user is typing, including missing parameters and con- textual information automatically clipped from the target web site. Inky is an example of a new kind of hybrid be- tween a command line and a GUI interface. We describe the design and implementation of two prototypes of this idea, and report the results of a field study.
Conference Paper
Full-text available
We propose a low-overhead sampling infrastructure for gathering information from the executions experienced by a program's user community. Several example applications illustrate ways to use sampled instrumentation to isolate bugs. Assertion-dense code can be transformed to share the cost of assertions among many users. Lacking assertions, broad guesses can be made about predicates that predict program errors and a process of elimination used to whittle these down to the true bug. Finally, even for non-deterministic bugs such as memory corruption, statistical modeling based on logistic regression allows us to identify program behaviors that are strongly correlated with failure and are therefore likely places to look for the error.
Article
Full-text available
Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therefore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis. We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format. As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer.
Article
Full-text available
QuickCheck is a tool which aids the Haskell programmer in formulating and testing properties of programs. Properties are described as Haskell functions, and can be automatically tested on random input, but it is also possible to define custom test data generators. We present a number of case studies, in which the tool was successfully used, and also point out some pitfalls to avoid. Random testing is especially suitable for functional programs because properties can be stated at a fine grain. When a function is built from separately tested components, then random testing suffices to obtain good coverage of the definition under test.
Conference Paper
While the web contains many social websites, people are generally left in the dark about the activities of other people traversing the web as a whole. In this paper, we explore the potential benefits and privacy considerations around generating a real-time, publicly accessible stream of web activity where users can publish chosen parts of their web browsing data. Taking inspiration from social media systems, we describe individual benefits that can be unlocked by such sharing and that may incentivize users to publish aspects of their browsing. We ask whether and how these benefits outweigh potential costs in lost privacy. We conduct our study of public web activity sharing through scenario-based interviews and a field deployment of a tool for web activity sharing.
Conference Paper
An effective way to learn computer programming is to sit side-by-side in front of the same computer with a tutor or peer, write code together, and then discuss what happens as the code executes. To bring this kind of in-person interaction to an online setting, we have developed Codechella, a multi-user Web-based program visualization system that enables multiple people to collaboratively write code together, explore an automatically-generated visualization of its execution state using multiple mouse cursors, and chat via an embedded text box. In nine months of live deployment on an educational website - www.pythontutor.com -people from 296 cities across 40 countries participated in 299 Codechella sessions for both tutoring and collaborative learning. 57% of sessions connected participants from different cities, and 12% from different countries. Participants actively engaged with the program visualizations while chatting, showed affective exchanges such as encouragement and banter, and indicated signs of learning at the lower three levels of Bloom's taxonomy: remembering, understanding, and applying knowledge.
Conference Paper
A common way to learn is by studying written step-by-step tutorials such as worked examples. However, tutorials for computer programming can be tedious to create since a static text-based format cannot convey what happens as code executes. We created a system called Codepourri that enables people to easily create visual coding tutorials by annotating steps in an automatically-generated program visualization. Using Codepourri, we developed a novel crowdsourcing workflow where learners who are visiting an educational website (www. pythontutor.com) collectively create a tutorial by annotating execution steps in a piece of code and then voting on the best annotations. Since there are far more learners than experts, using learners as a crowd is a potentially more scalable way of creating tutorials. Our experiments with 4 expert judges and 101 learners adding 145 raw annotations to two pieces of textbook Python code show the learner crowd's annotations to be accurate, informative, and containing some insights that even experts missed.
Conference Paper
Documentation and unit tests increase software maintainability, but real world software projects rarely have adequate coverage. We hypothesize that, in part, this is because existing authoring tools require developers to adjust their workflows significantly. To study whether improved interaction design could affect unit testing and documentation practice, we created an authoring support tool called Vesta. The main insight guiding Vesta's interaction design is that developers frequently manually test the software they are building. We propose leveraging runtime information from these manual executions. Because developers naturally exercise the part of the code on which they are currently working, this information will be highly relevant to appropriate documentation and testing tasks. In a complex coding task, nearly all documentation created using Vesta was accurate, compared to only 60% of documentation created without Vesta, and Vesta was able to generate significant portions of all tests, even those written manually by developers without Vesta.
Conference Paper
" Current traditional feedback methods, such as hand-grading student code for substance and style, are labor intensive and do not scale. We created a user interface that addresses feedback at scale for a particular and important aspect of code quality: variable names. We built this user interface on top of an existing back-end that distinguishes variables by their behavior in the program. Therefore our interface not only allows teachers to comment on poor variable names, they can comment on names that mislead the reader about the variable's role in the program. We ran two user studies in which 10 teachers and 6 students created and received feedback, respectively. The interface helped teachers give personalized variable name feedback on thousands of student solutions from an edX introductory programming MOOC. In the second study, students composed solutions to the same programming assignments and immediately received personalized quizzes composed by teachers in the previous user study.
Conference Paper
" Answering questions with data is a difficult and time-consuming process. Visual dashboards and templates make it easy to get started, but asking more sophisticated questions often requires learning a tool designed for expert analysts. Natural language interaction allows users to ask questions directly in complex programs without having to learn how to use an interface. However, natural language is often ambiguous. In this work we propose a mixed-initiative approach to managing ambiguity in natural language interfaces for data visualization. We model ambiguity throughout the process of turning a natural language query into a visualization and use algorithmic disambiguation coupled with interactive ambiguity widgets. These widgets allow the user to resolve ambiguities by surfacing system decisions at the point where the ambiguity matters. Corrections are stored as constraints and influence subsequent queries. We have implemented these ideas in a system, DataTone. In a comparative study, we find that DataTone is easy to learn and lets users ask questions without worrying about syntax and proper question form.
Conference Paper
Microtask crowdsourcing organizes complex work into workflows, decomposing large tasks into small, relatively independent microtasks. Applied to software development, this model might increase participation in open source software development by lowering the barriers to contribu-tion and dramatically decrease time to market by increasing the parallelism in development work. To explore this idea, we have developed an approach to decomposing programming work into microtasks. Work is coordinated through tracking changes to a graph of artifacts, generating appropriate microtasks and propagating change notifications to artifacts with dependencies. We have implemented our approach in CrowdCode, a cloud IDE for crowd development. To evaluate the feasibility of microtask programming, we performed a small study and found that a small crowd of 12 workers was able to successfully write 480 lines of code and 61 unit tests in 14.25 person-hours of time.
Article
While the web contains many social websites, people are generally left in the dark about the activities of other people traversing the web as a whole. In this paper, we explore the potential benefits and privacy considerations around generating a real-time, publicly accessible stream of web activity where users can publish chosen parts of their web browsing data. Taking inspiration from social media systems, we describe individual benefits that can be unlocked by such sharing and that may incentivize users to publish aspects of their browsing. We ask whether and how these benefits outweigh potential costs in lost privacy. We conduct our study of public web activity sharing through scenario-based interviews and a field deployment of a tool for web activity sharing.
Article
Users often describe what they want to accomplish with an application in a language that is very different from the application's domain language. To address this gap between system and human language, we propose modeling an application's domain language by mining a large corpus of Web documents about the application using deep learning techniques. A high dimensional vector space representation can model the relationships between user tasks, system commands, and natural language descriptions and supports mapping operations, such as identifying likely system commands given natural language queries and identifying user tasks given a trace of user operations. We demonstrate the feasibility of this approach with a system, COMMANDSPACE, for the popular photo editing application Adobe Photoshop. We build and evaluate several applications enabled by our model showing the power and flexibility of this approach.
Article
In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification. We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness. We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.
Article
In MOOCs, a single programming exercise may produce thousands of solutions from learners. Understanding solution variation is important for providing appropriate feedback to students at scale. The wide variation among these solutions can be a source of pedagogically valuable examples and can be used to refine the autograder for the exercise by exposing corner cases. We present OverCode, a system for visualizing and exploring thousands of programming solutions. OverCode uses both static and dynamic analysis to cluster similar solutions, and lets teachers further filter and cluster solutions based on different criteria. We evaluated OverCode against a nonclustering baseline in a within-subjects study with 24 teaching assistants and found that the OverCode interface allows teachers to more quickly develop a high-level view of students' understanding and misconceptions, and to provide feedback that is relevant to more students' solutions.
Conference Paper
We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyber-physical systems. Correctness properties of such programs take the form of queries that seek the probabilities of assertions over program variables. We present a static analysis approach that provides guaranteed interval bounds on the values (assertion probabilities) of such queries. First, we observe that for probabilistic programs, it is possible to conclude facts about the behavior of the entire program by choosing a finite, adequate set of its paths. We provide strategies for choosing such a set of paths and verifying its adequacy. The queries are evaluated over each path by a combination of symbolic execution and probabilistic volume-bound computations. Each path yields interval bounds that can be summed up with a "coverage" bound to yield an interval that encloses the probability of assertion for the program as a whole. We demonstrate promising results on a suite of benchmarks from many different sources including robotic manipulators and medical decision making programs.
Article
We present Theseus, an IDE extension that visualizes run-time behavior within a JavaScript code editor. By displaying real-time information about how code actually behaves during execution, Theseus proactively addresses misconceptions by drawing attention to similarities and differences between the programmer's idea of what code does and what it actually does. To understand how programmers would respond to this kind of an always-on visualization, we ran a lab study with graduate students, and interviewed 9 professional programmers who were asked to use Theseus in their day-to-day work. We found that users quickly adopted strategies that are unique to always-on, real-time visualizations, and used the additional information to guide their navigation through their code.
Article
While emergent behaviors are uncodified across many domains such as programming and writing, interfaces need explicit rules to support users. We hypothesize that by codifying emergent programming behavior, software engineering interfaces can support a far broader set of developer needs. To explore this idea, we built Codex, a knowledge base that records common practice for the Ruby programming language by indexing over three million lines of popular code. Codex enables new data-driven interfaces for programming systems: statistical linting, identifying code that is unlikely to occur in practice and may constitute a bug; pattern annotation, automatically discovering common programming idioms and annotating them with metadata using expert crowdsourcing; and library generation, constructing a utility package that encapsulates and reflects emergent software practice. We evaluate these applications to find Codex captures a broad swatch of programming practice, statistical linting detects problematic code snippets, and pattern annotation discovers nontrivial idioms such as basic HTTP authentication and database migration templates. Our work suggests that operationalizing practice-driven knowledge in structured domains such as programming can enable a new class of user interfaces.
Conference Paper
Photo editing can be a challenging task, and it becomes even more difficult on the small, portable screens of mobile devices that are now frequently used to capture and edit images. To address this problem we present PixelTone, a multimodal photo editing interface that combines speech and direct manipulation. We observe existing image editing practices and derive a set of principles that guide our design. In particular, we use natural language for expressing desired changes to an image, and sketching to localize these changes to specific regions. To support the language commonly used in photo-editing we develop a customized natural language interpreter that maps user phrases to specific image processing operations. Finally, we perform a user study that evaluates and demonstrates the effectiveness of our interface.
Conference Paper
We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyber-physical systems. Correctness properties of such programs take the form of queries that seek the probabilities of assertions over program variables. We present a static analysis approach that provides guaranteed interval bounds on the values (assertion probabilities) of such queries. First, we observe that for probabilistic programs, it is possible to conclude facts about the behavior of the entire program by choosing a finite, adequate set of its paths. We provide strategies for choosing such a set of paths and verifying its adequacy. The queries are evaluated over each path by a combination of symbolic execution and probabilistic volume-bound computations. Each path yields interval bounds that can be summed up with a "coverage" bound to yield an interval that encloses the probability of assertion for the program as a whole. We demonstrate promising results on a suite of benchmarks from many different sources including robotic manipulators and medical decision making programs.
Article
Programmers frequently use instructive code examples found on the Web to overcome cognitive barriers while programming. These examples couple the concrete functionality of code with rich contextual information about how the code works. However, using these examples necessitates understanding, configuring, and integrating the code, all of which typically take place after the example enters the user's code and has been removed from its original instructive context. In short, a user's interaction with an example continues well after the code is pasted. This paper investigates whether treating examples as "first-class" objects in the code editor - rather than simply as strings of text - will allow programmers to use examples more effectively. We explore this through the creation and evaluation of Codelets. A Codelet is presented inline with the user's code, and consists of a block of example code and an interactive helper widget that assists the user in understanding and integrating the example. The Codelet persists throughout the example's lifecycle, remaining accessible even after configuration and integration is done. A comparative laboratory study with 20 participants found that programmers were able to complete tasks involving examples an average of 43% faster when using Codelets than when using a standard Web browser.
Article
We present a new technique, failure-oblivious computing, that enables servers to execute through memory errors without memory corruption. Our safe compiler for C inserts checks that dynamically detect invalid memory accesses. Instead of terminating or throwing an exception, the generated code simply discards invalid writes and manufactures values to return for invalid reads, enabling the server to continue its normal execution path. We have applied failure-oblivious computing to a set of widely-used servers from the Linux-based open-source computing environment. Our results show that our techniques 1) make these servers invulnerable to known security attacks that exploit memory errors, and 2) enable the servers to continue to operate successfully to service legitimate requests and satisfy the needs of their users even after attacks trigger their memory errors. We observed several reasons for this successful continued execution. When the memory errors occur in irrelevant computations, failure-oblivious computing enables the server to execute through the memory errors to continue on to execute the relevant computation. Even when the memory errors occur in relevant computations, failure-oblivious computing converts requests that trigger unanticipated and dangerous execution paths into anticipated invalid inputs, which the error-handling logic in the server rejects. Because servers tend to have small error propagation distances (localized errors in the computation for one request tend to have little or no effect on the computations for subsequent requests), redirecting reads that would otherwise cause addressing errors and discarding writes that would otherwise corrupt critical data structures (such as the call stack) localizes the effect of the memory errors, prevents addressing exceptions from terminating the computation, and enables the server to continue on to successfully process subsequent requests. The overall result is a substantial extension of the range of requests that the server can successfully process.
Article
When faced with the need for documentation, examples, bug fixes, error descriptions, code snippets, workarounds, templates, patterns, or advice, software developers frequently turn to their web browser. Web resources both organized and authoritative as well as informal and community-driven are heavily used by developers. The time and attention devoted to finding (or re-finding) and navigating these sites is significant. We present Codetrail, a system that demonstrates how the developer's use of web resources can be improved by connecting the Eclipse integrated development environment (IDE) and the Firefox web browser. Codetrail uses a communication channel and shared data model between these applications to implement a variety of integrative tools. By combining information previously available only to the IDE or the web browser alone (such as editing history, code contents, and recent browsing), Codetrail can automate previously manual tasks and enable new interactions that exploit the marriage of data and functionality from Firefox and Eclipse. Just as the IDE will change the contents of peripheral views to focus on the particular code or task with which the developer is engaged, so, too, the web browser can be focused on the developer's current context and task.
Conference Paper
Automatic program repair has been a longstanding goal in software engineering, yet debugging remains a largely manual process. We introduce a fully automated method for locating and repairing bugs in software. The approach works on off-the-shelf legacy applications and does not re- quire formal specifications, program annotations or special coding practices. Once a program fault is discovered, an extendedformofgeneticprogrammingisused toevolvepro- gram variants until one is found that both retains required functionality and also avoids the defect in question. Stan- dard test cases are used to exercise the fault and to encode program requirements. After a successful repair has been discovered, it is minimized using structural differencing al- gorithms and delta debugging. We describe the proposed method and report experimental results demonstrating that it can successfully repair ten different C programs totaling 63,000 lines in under 200 seconds, on average.
Conference Paper
Interpreting compiler error messages is challenging for novice programmers. Presenting examples how other pro- grammers have corrected similar errors may help novices understand and correct such errors. This paper introduces HelpMeOut, a social recommender system that aids debug- ging by suggesting solutions that peers have applied in the past. HelpMeOut comprises IDE instrumentation to collect examples of code changes that fix compiler errors; a central database that stores fix reports from many users; and a suggestion interface that, given an error, queries the data- base and presents relevant fixes to the programmer.
Conference Paper
Programmers frequently use the Web while writing code: they search for libraries, code examples, tutorials, and documentation. This link between code and visited Web pages remains implicit today. Connecting source code and browsing histories might help programmers maintain con-text, reduce the cost of Web page re-retrieval, and enhance understanding when code is shared. This note introduces HyperSource, an IDE augmentation that associates browsing histories with source code edits. HyperSource comprises a browser extension that logs visited pages; an IDE that tracks user activity and maps pages to code edits; a source document model that tracks visited pages at a character level; and a user interface that enables interaction with these histories. We discuss relevance heuristics and privacy issues inherent in this approach. Informal log analyses and user feedback suggest that our annotation model is promising for code editing and might also apply to other document authoring tasks after refinement.
Conference Paper
It is becoming increasingly important for applications to protect sensitive data. With current techniques, the programmer bears the burden of ensuring that the application's behavior adheres to policies about where sensitive values may flow. Unfortunately, privacy policies are difficult to manage because their global nature requires coordinated reasoning and enforcement. To address this problem, we describe a programming model that makes the system responsible for ensuring adherence to privacy policies. The programming model has two components: 1) core programs describing functionality independent of privacy concerns and 2) declarative, decentralized policies controlling how sensitive values are disclosed. Each sensitive value encapsulates multiple views; policies describe which views are allowed based on the output context. The system is responsible for automatically ensuring that outputs are consistent with the policies. We have implemented this programming model in a new functional constraint language named Jeeves. In Jeeves, sensitive values are introduced as symbolic variables and policies correspond to constraints that are resolved at output channels. We have implemented Jeeves as a Scala library using an SMT solver as a model finder. In this paper we describe the dynamic and static semantics of Jeeves and the properties about policy enforcement that the semantics guarantees. We also describe our experience implementing a conference management system and a social network.
Conference Paper
Mechanical Turk (MTurk) provides an on-demand source of human computation. This provides a tremendous opportunity to explore algorithms which incorporate human computation as a function call. However, various systems challenges make this difficult in practice, and most uses of MTurk post large numbers of independent tasks. TurKit is a toolkit for prototyping and exploring algorithmic human computation, while maintaining a straight-forward imperative programming style. We present the crash-and-rerun programming model that makes TurKit possible, along with a variety of applications for human computation algorithms. We also present case studies of TurKit used for real experiments across different fields.
Conference Paper
In an object-oriented program, a unit test often consists of a sequence of method calls that create and mutate objects, then use them as arguments to a method under test. It is challenging to automatically generate sequences that are legal and behaviorally-diverse, that is, reaching as many different program states as possible. This paper proposes a combined static and dynamic automated test generation approach to address these problems, for code without a formal specification. Our approach first uses dynamic analysis to infer a call sequence model from a sample execution, then uses static analysis to identify method dependence relations based on the fields they may read or write. Finally, both the dynamically-inferred model (which tends to be accurate but incomplete) and the statically-identified dependence information (which tends to be conservative) guide a random test generator to create legal and behaviorally-diverse tests. Our Palus tool implements this testing approach. We compared its effectiveness with a pure random approach, a dynamic-random approach (without a static phase), and a static-random approach (without a dynamic phase) on several popular open-source Java programs. Tests generated by Palus achieved higher structural coverage and found more bugs. Palus is also internally used in Google. It has found 22 previously-unknown bugs in four well-tested Google products.
Conference Paper
We present a new symbolic execution tool, KLEE, ca- pable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions of Unix sys- tems, and arguably are the single most heavily tested set of open-source programs in existence. KLEE-generated tests achieve high line coverage — on average over 90% per tool (median: over 94%) — and significantly beat the coverage of the developers' own hand-written test suite. When we did the same for 75 equivalent tools in the BUSYBOX embedded system suite, results were even better, including 100% coverage on 31 of them. We also used KLEE as a bug finding tool, applying it to 452 applications (over 430K total lines of code), where it found 56 serious bugs, including three in C OREUTILS that had been missed for over 15 years. Finally, we used KLEE to crosscheck purportedly identical B USYBOX and COREUTILS utilities, finding functional correctness er- rors and a myriad of inconsistencies.
Article
Large APIs can be hard to learn, and this can lead to decreased programmer productivity. But what makes APIs hard to learn? We conducted a mixed approach, multi-phased study of the obstacles faced by Microsoft developers learning a wide variety of new APIs. The study involved a combination of surveys and in-person interviews, and collected the opinions and experiences of over 440 professional developers. We found that some of the most severe obstacles faced by developers learning new APIs pertained to the documentation and other learning resources. We report on the obstacles developers face when learning new APIs, with a special focus on obstacles related to API documentation. Our qualitative analysis elicited five important factors to consider when designing API documentation: documentation of intent; code examples; matching APIs with scenarios; penetrability of the API; and format and presentation. We analyzed how these factors can be interpreted to prioritize API documentation development efforts
Article
Random testing of programs has usually (but not always) been viewed as a worst case of program testing. Testing strategies that take into account the program structure are generally preferred. Path testing is an often proposed ideal for structural testing. Path testing is treated here as an instance of partition testing, where by partition testing is meant any testing scheme which forces execution of at least one test case from each subset of a partition of the input domain. Simulation results are presented which suggest that random testing may often be more cost effective than partition testing schemes. Also, results of actual random testing experiments are presented which confirm the viability of random testing as a useful validation tool.
Article
Keyword programming is a novel technique for reducing the need to remember details of programming language syntax and APIs, by translating a small number of unordered keywords provided by the user into a valid expression. In a sense, the keywords act as a query that searches the space of expressions that are valid in the given context. Prior work has demonstrated the feasibility and merit of this approach in limited domains. This paper explores the potential for employing this technique in much larger domains, specifically general-purpose programming languages like Java. We present an algorithm for translating keywords into Java method call expressions. When tested on keywords extracted from existing method calls in Java code, the algorithm can accurately reconstruct over 90% of the original expressions. We tested the algorithm on keywords provided by users in a web-based study. The results suggest that users can obtain correct Java code using keyword queries as accurately as they can write the correct Java code themselves. We implemented the algorithm in an Eclipse plug-in as an extension to the autocomplete mechanism and deployed it in a preliminary field study of several users, with mixed results. One interesting result of this work is that most of the information in Java method call expressions lies in the keywords, and details of punctuation and even parameter ordering can often be inferred automatically. Quanta Computer - TParty project National Science Foundation
Article
Un sistema de tipos es un método para la detección automática de la ausencia de ciertos comportamientos erróneos cuando se clasifican las frases de un programa de acuerdo a los valores que computan. En esta obra se ofrece una introducción pragmática y operativa a la teoría de los sistemas de tipos y a los lenguajes de programación, con miras a facilitar la su utilización en la ingeniería de software, el diseño de lenguajes de programación, el alto rendimiento de los compiladores y la seguridad del sistema.
Article
This paper describes how a program called dup can be used to locate instances of duplication or nearduplication in a software system. Dup reports both textually identical sections of code and sections that are the same textually except for systematic substitution of one set of variable names and constants for another. Further processing locates longer sections of code that are the same except for other small modifications. Experimental results from running dup on millions of lines from two large software systems show dup to be both effective at locating duplication and fast. Applications could include identifying sections of code that should be replaced by procedures, elimination of duplication during reengineering of the system, redocumentation to include references to copies, and debugging. 1
Rage-quit: Coder unpublished 17 lines of javascript and " broke the internet
  • Ars Technica
Ars Technica. Rage-quit: Coder unpublished 17 lines of javascript and " broke the internet ". 2016.
Microtask programming: Building software with a crowd
  • T D Latoza
  • W B Towne
  • C M Adriano
  • A Van Der Hoek
LaToza, T.D., Towne, W.B., Adriano, C.M., and Van Der Hoek, A. Microtask programming: Building software with a crowd. In Proc. UIST. 2014.