Martin Rinard's research while affiliated with Massachusetts Institute of Technology and other places

Publications (419)

Chapter
We study the impact of player capability on social welfare in congestion games. We introduce a new game, the Distance-bounded Network Congestion game (DNC), as the basis of our study. DNC is a symmetric network congestion game with a bound on the number of edges each player can use. We show that DNC is PLS-complete in contrast to standard symmetric...
Preprint
We present a new class of strategic games, mixed capability games, as a foundation for studying how different player capabilities impact the dynamics and outcomes of strategic games. We analyze the impact of different player capabilities via a capability transfer function that characterizes the payoff of each player at equilibrium given capabilitie...
Preprint
We study the impact of player capability on social welfare in congestion games. We introduce a new game, the Distance-bounded Network Congestion game (DNC), as the basis of our study. DNC is a symmetric network congestion game with a bound on the number of edges each player can use. We show that DNC is PLS-complete in contrast to standard symmetric...
Chapter
Researchers have developed neural network verification algorithms motivated by the need to characterize the robustness of deep neural networks. The verifiers aspire to answer whether a neural network guarantees certain properties with respect to all inputs in a space. However, many verifiers inaccurately model floating point arithmetic but do not t...
Preprint
Software developers must often replace existing components in their systems to adapt to evolving environments or tooling. While traditional code search systems are effective at retrieving components with related functionality, it is much more challenging to retrieve components that can be used to directly replace existing functionality, as replacem...
Chapter
Deep neural networks are an attractive tool for compressing the control policy lookup tables in systems such as the Airborne Collision Avoidance System (ACAS). It is vital to ensure the safety of such neural controllers via verification techniques. The problem of analyzing ACAS Xu networks has motivated many successful neural network verifiers. The...
Article
We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting...
Preprint
Deep neural networks are an attractive tool for compressing the control policy lookup tables in systems such as the Airborne Collision Avoidance System (ACAS). It is vital to ensure the safety of such neural controllers via verification techniques. The problem of analyzing ACAS Xu networks has motivated many successful neural network verifiers. The...
Article
Automated machine learning (AutoML) promises to democratize machine learning by automatically generating machine learning pipelines with little to no user intervention. Typically, a search procedure is used to repeatedly generate and validate candidate pipelines, maximizing a predictive performance metric, subject to a limited execution time budget...
Preprint
Full-text available
A recent line of work has shown that deep networks are highly susceptible to backdoor data poisoning attacks. Specifically, by injecting a small amount of malicious data into the training distribution, an adversary gains the ability to control the model's behavior during inference. In this work, we propose an iterative training procedure for removi...
Preprint
We present a new synthesis algorithm to solve program synthesis over noisy datasets, i.e., data that may contain incorrect/corrupted input-output examples. Our algorithm uses an abstraction refinement based optimization process to synthesize programs which optimize the tradeoff between the loss over the noisy dataset and the complexity of the synth...
Preprint
Full-text available
We present BIEBER (Byte-IdEntical Binary parsER), the first system to model and regenerate a full working parser from instrumented program executions. To achieve this, BIEBER exploits the regularity (e.g., header fields and array-like data structures) that is commonly found in file formats. Key generalization steps derive strided loops that parse i...
Preprint
We explore and formalize the task of synthesizing programs over noisy data, i.e., data that may contain corrupted input-output examples. By formalizing the concept of a Noise Source, an Input Source, and a prior distribution over programs, we formalize the probabilistic process which constructs a noisy dataset. This formalism allows us to define th...
Preprint
A key challenge for reinforcement learning is solving long-horizon planning and control problems. Recent work has proposed leveraging programs to help guide the learning algorithm in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task they seek to achieve. We...
Article
We present K onure , a new system that uses active learning to infer models of applications that retrieve data from relational databases. K onure comprises a domain-specific language (each model is a program in this language) and associated inference algorithm that infers models of applications whose behavior can be expressed in this language. The...
Article
394 Background: Previous work by our group has demonstrated that leveraging Machine Learning on diagnostic codes from Electronic Health Records (EHRs), can identify individuals at high-risk for Pancreatic Duct Adenocarcinoma (PDAC), as early as 1 year before current cancer diagnosis. We aim to improve the performance of our existing PDAC risk strat...
Preprint
We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is cha...
Article
Aim Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at a late, incurable stage. We sought to determine whether individuals at high risk of developing PDAC could be identified early using routinely collected data. Methods Electronic health record (EHR) databases from two independent hospitals in Boston, Massachusetts, providing inpatient...
Preprint
We present KumQuat, a system for automatically synthesizing parallel and distributed versions of Unix shell commands. KumQuat follows a divide-and-conquer approach, decomposing commands into (i) a parallel mapper applying the original command to produce partial results, and (ii) an ordered combiner that combines the partial results into the final o...
Preprint
We present a dataflow model for extracting data parallelism latent in Unix shell scripts. To accurately capture the semantics of Unix shell scripts, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in t...
Preprint
We present the Sum-Product Probabilistic Language (SPPL), a new system that automatically delivers exact solutions to a broad range of probabilistic inference queries. SPPL symbolically represents the full distribution on execution traces specified by a probabilistic program using a generalization of sum-product networks. SPPL handles continuous an...
Preprint
We present a new framework and associated synthesis algorithms for program synthesis over noisy data, i.e., data that may contain incorrect/corrupted input-output examples. This framework is based on an extension of finite tree automata called {\em weighted finite tree automata}. We show how to apply this framework to formulate and solve a variety...
Preprint
Leveraging concepts from state machine refinement proofs, we use prophecy variables, which predict information about the future program execution, to enable forward reasoning for backward dataflow analyses. Drawing prophecy and history variables (concepts from the dynamic execution of the program) from the same lattice as the static program analysi...
Article
2020 IEEE. We present ClearTrack, a system that tracks meta-data for each primitive value in Java programs to detect and nullify a range of vulnerabilities such as integer overflow/underflow and SQL/command injection vulnerabilities. Contributions include new techniques for eliminating false positives associated with benign integer overflows and un...
Article
2020 ACM. Background: Application frameworks, such as Ruby on Rails, introduce abstractions with the goal of simplifying development for particular application domains, such as web development. While experts enjoy increased productivity due to these abstractions, the flow of the programs is often hard to understand for non-experts and newcomers due...
Preprint
We present a new approach for synthesizing training data given only a single example of each class. Rather than learn over a large but fixed dataset of examples, we generate our entire training set using only the synthetic examples provided. The goal is to learn a classifier that generalizes to a non-synthetic domain without pretraining or fine-tun...
Preprint
Full-text available
We present a new system, EEV, for verifying binarized neural networks (BNNs). We formulate BNN verification as a Boolean satisfiability problem (SAT) with reified cardinality constraints of the form $y = (x_1 + \cdots + x_n \le b)$, where $x_i$ and $y$ are Boolean variables possibly with negation and $b$ is an integer constant. We also identify two...
Preprint
Manifold regularization is a technique that penalizes the complexity of learned functions over the intrinsic geometry of input data. We develop a connection to learning functions which are "locally stable", and propose new regularization terms for training deep neural networks that are stable against a class of local perturbations. These regularize...
Preprint
This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips. We prove that this algorithm, which we call the Fast Loaded Dice Roller (FLDR), is highly efficient in both space and time: (i) the size of the sample...
Preprint
Full-text available
We show how to construct adversarial examples for neural networks with exactly verified robustness against $\ell_{\infty}$-bounded input perturbations by exploiting floating point error. We argue that any exact verification of real-valued neural networks must accurately model the implementation details of any floating point arithmetic used during i...
Article
Reconfigurable analog devices are a powerful new computing substrate especially appropriate for executing computationally intensive dynamical system computations in an energy efficient manner. We present Legno, a compilation toolchain for programmable analog devices. Legno targets the HCDCv2, a programmable analog device designed to execute general...
Article
Full-text available
679 Background: Pancreatic Adenocarcinoma (PDAC) is often diagnosed at an advanced stage. We sought to develop a model for early PDAC prediction in the general population, using electronic health records (EHRs) and machine learning. Methods: We used three EHR datasets from Beth-Israel Deaconess Medical Center (BIDMC) and Partners Healthcare (PHC):...
Preprint
This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \d...
Article
In this article, we present Warp, the first open hardware platform designed explicitly to support research in approximate computing. Warp incorporates 21 sensors, computation, and circuit-level facilities designed explicitly to enable approximate computing research, in a 3.6 cm × 3.3 cm × 0.5 cm device. Warp supports a wide range of precision and a...
Article
2020 Owner/Author. We consider a usage model for automated machine learning (AutoML) in which users can influence the generated pipeline by providing a weak pipeline specification: an unordered set of API components from which the AutoML system draws the components it places into the generated pipeline. Such specifications allow users to express pr...
Article
This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution (p1, …, pn), where the probabilities of the output distribution (p̂1, …, p̂n) of the sa...
Article
We present a new technique for automatically synthesizing replacement classes. The technique starts with an original class O and a potential replacement class R, then uses R to synthesize a new class that implements the same interface and provides the same functionality as O. Critically, our technique works with a synthe- sized inter-class equivale...
Article
We present ClearTrack, a system that tracks 32 bits of metadata for each primitive value in Java programs to detect and nullify a range of vulnerabilities such as integer overflow and underflow vulnerabilities, SQL injection vulnerabilities, and command injection vulnerabilities. Contributions include new techniques for eliminating false positives...
Conference Paper
Software applications have grown increasingly complex to deliver the features desired by users. Software modularity has been used as a way to mitigate the costs of developing such complex software. Active learning-based program inference provides an elegant framework that exploits this modularity to tackle development correctness, performance and c...
Article
2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Software applications have grown increasingly complex to deliver the features desired by users. Software modularity has been used as a way to mitigate the costs of developing such complex software. Active learning-based program inference provides an elegant framework th...
Article
Full-text available
We present AL, a novel automated machine learning system that learns to generate new supervised learning pipelines from an existing corpus of supervised learning programs. In contrast to existing automated machine learning tools, which typically implement a search over manually selected machine learning functions and classes, AL learns to identify...
Article
2019 IEEE. We present a study that characterizes the way developers use automatically generated patches when fixing software defects. Our study tasked two groups of developers with repairing defects in C programs. Both groups were provided with the defective line of code. One was also provided with five automatically generated and validated patches...
Article
2019 IEEE. Because of differences between development and production environments, many software performance problems are detected only after software enters production. We present PerformanceHat, a new system that uses profiling information from production executions to develop a global performance model suitable for integration into interactive d...
Preprint
We present a study that characterizes the way developers use automatically generated patches when fixing software defects. Our study tasked two groups of developers with repairing defects in C programs. Both groups were provided with the defective line of code. One was also provided with five automatically generated and validated patches, all of wh...
Preprint
We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs i...
Preprint
Inference metaprogramming enables effective probabilistic programming by supporting the decomposition of executions of probabilistic programs into subproblems and the deployment of hybrid probabilistic inference algorithms that apply different probabilistic inference algorithms to different subproblems. We introduce the concept of independent subpr...
Article
We present Marten, a new end to end system for automatically discovering, exploiting, and combining information leakage and buffer overflow vulnerabilities to derandomize and exploit remote, fully randomized processes. Results from two case studies high- light Marten’s ability to generate short, robust ROP chain exploits that bypass address space l...
Preprint
Full-text available
We present the first verification that a neural network produces a correct output within a specified tolerance for every input of interest. We define correctness relative to a specification which identifies 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the stat...
Article
We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs i...
Conference Paper
Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large w...
Article
Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large w...
Conference Paper
As modern computation platforms become increasingly complex, their programming interfaces are increasingly difficult to use. This complexity is especially inappropriate given the relatively simple core functionality that many of the computations implement. We present a new approach for obtaining software that executes on modern computing platforms...
Conference Paper
Software correctness and security have been a central issue in the field for decades. Researchers have developed a wide range of approaches to these problems, none of which has solved these problems to date. In this talk I consider two very different approaches to solving correctness and security problems, failure-oblivious computing and domain-spe...
Article
As modern computation platforms become increasingly complex, their programming interfaces are increasingly difficult to use. This complexity is especially inappropriate given the relatively simple core functionality that many of the computations implement. We present a new approach for obtaining software that executes on modern computing platforms...
Article
We previously developed Konure, a tool that uses active learning to infer the functionality of database applications. An alternative approach is to observe the inputs, outputs, and database traffic from a running system in normal use and then synthesize a model of the application from this information. To evaluate these two approaches, we present E...
Article
We present a new technique that uses active learning to infer models of applications that manipulate relational databases. This technique comprises a domain-specific language for modeling applications that access databases (each model is a program in this language) and an associated inference algorithm that infers models of applications whose behav...
Preprint
Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for in-flight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large...
Article
Human color perception acuity is not uniform across colors. This makes it possible to transform drawing programs to generate outputs whose colors are perceptually equivalent but numerically distinct. One benet of such transformations is lower display power dissipation on organic light-emitting diode (OLED) displays. We introduce Ishihara, a languag...
Conference Paper
Full-text available
In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the c...
Conference Paper
We introduce inference metaprogramming for probabilistic programming languages, including new language constructs, a formalism, and the rst demonstration of e ectiveness in practice. Instead of relying on rigid black-box inference algorithms hard-coded into the language implementation as in previous probabilistic programming languages, infer- ence...
Article
We introduce inference metaprogramming for probabilistic programming languages, including new language constructs, a formalism, and the rst demonstration of e ectiveness in practice. Instead of relying on rigid black-box inference algorithms hard-coded into the language implementation as in previous probabilistic programming languages, infer- ence...
Article
In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the c...
Preprint
We present Warp, a hardware platform to support research in approximate computing, sensor energy optimization, and energy-scavenged systems. Warp incorporates 11 state-of-the-art sensor integrated circuits, computation, and an energy-scavenged power supply, all within a miniature system that is just 3.6 cm x 3.3 cm x 0.5 cm. Warp's sensor integrate...
Article
Full-text available
The sizes of compressed images depend on their spatial resolution (number of pixels) and on their color resolution (number of color quantization levels). We introduce DaltonQuant, a new color quantization technique for image compression that cloud services can apply to images destined for a specific user with known color vision deficiencies. Dalton...
Article
Full-text available
In this position paper, we describe our vision of the future of machine-based programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes...
Conference Paper
Programmable analog devices are a powerful new computing substrate that are especially appropriate for performing computationally intensive simulations of neuromorphic and cytomorphic models. Current state of the art techniques for configuring analog devices to simulate dynamical systems do not consider the current and voltage operating ranges of a...
Article
Programmable analog devices are a powerful new computing substrate that are especially appropriate for performing computationally intensive simulations of neuromorphic and cytomorphic models. Current state of the art techniques for configuring analog devices to simulate dynamical systems do not consider the current and voltage operating ranges of a...
Article
We present CrowdLearn, a new system that processes an existing corpus of crowdsourced machine learning programs to learn how to generate effective pipelines for solving supervised machine learning problems. CrowdLearn uses a probabilistic model of program likelihood, conditioned on the current sequence of pipeline components and on the characterist...
Conference Paper
We present a new language construct, filtered iterators, for robust input processing. Filtered iterators are designed to eliminate many common input processing errors while enabling robust continued execution. The design is inspired by (1) observed common input processing errors and (2) successful strategies implemented by human developers fixing i...
Article
We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects...
Article
We present a new technique that infers models of programs that manipulate relational databases. This technique generates test databases and input commands, runs the program, then observes the resulting outputs and updated databases to infer the model. Because the technique works only with the externally observable inputs, outputs, and databases, it...
Conference Paper
We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obt...
Conference Paper
We present a new system, Genesis, that processes human patches to automatically infer code transforms for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the complete Genesis patch generation system working with real-world patches and defects collected from 372 Java projects...
Article
We present CodeCarbonCopy (CCC), a system for transferring code from a donor application into a recipient application. CCC starts with functionality identified by the developer to transfer into an insertion point (again identified by the developer) in the recipient. CCC uses paired executions of the donor and recipient on the same input file to obt...
Article
As modern computation platforms become increasingly complex, their programming interfaces are increasingly difficult to use. This complexity is especially inappropriate given the relatively simple core functionality that many of the computations implement. We present a new approach for obtaining so ware that executes on modern computing platforms w...
Article
This survey explores the theory and practice of techniques to make computing systems faster or more energy-efficient by allowing them to make controlled errors. In the same way that systems which only use as much energy as necessary are referred to as being energy-efficient, you can think of the class of systems addressed by this survey as being er...
Book
Error-Efficient Computing Systems explores the theory and practice of techniques to make computing systems faster or more energy-efficient by allowing them to make controlled errors. In the same way that systems which only use as much energy as necessary are referred to as being energy-efficient, the class of systems addressed by this survey can be...
Conference Paper
We present an adaptive binary transformation system for reducing the energy impact of advertisements and analytics in mobile applications. Our approach accommodates both the needs of mobile app developers to obtain income from advertisements and the desire of mobile device users for longer battery life. Our technique automatically identifies recurr...
Article
Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approximate computing applications (such as multimedia p...
Article
We present a new system, Genesis, that processes sets of human patches to automatically infer code transforms and search spaces for automatic patch generation. We present results that characterize the effectiveness of the Genesis inference algorithms and the resulting complete Genesis patch generation system working with real-world patches and erro...

Citations

... We see our large-scale analysis on command-line user customizations manifested in alias definitions as a unique window of opportunity to study how the standard environment of the command line could be productively extended, modified, and improved. Our work goes hand in hand with existing efforts to innovate on the experience of command lines that employ techniques from research in systems (Raghavan et al. 2020;Handa et al. 2021), software engineering and programming languages Vasilakis et al. 2020;D'Antoni et al. 2017), human-computer interaction (Vaithilingam and Guo 2019;Gandhi and Gandhi 2020), and artificial intelligence (Agarwal et al. 2020;Lin et al. 2018;Hou et al. 2021). Particularly, our extensive qualitative and quantitative analysis, in conjunction with our dataset, form the basis for identifying opportunities for improving command-line experience in the following directions: by characterizing customization practices, we gain a categorical understanding underlying the needs and wants of command-line users; based on our analysis, we identify opportunities for innovation and formulate them as implications, accompanied with concrete scenarios and examples; further, our comprehensive dataset enables the foundation of learning approaches, as part of learning-based program synthesis (Bruch et al. 2009;Raychev et al. 2014), automated repair (Monperrus 2018), and recommendation systems (Mens and Lozano 2014); finally, we also see our results and datasets as a basis for usability research that can impact the design of tools and the future of the shell in general. ...
... Safeguard M e a n M e d i a n M e a n M e d i a n M e a n U / C U s a g e M e a n M e d i a n Protect production branch [85], [ Integrity check of dependencies through cryptographic hashes [9], [36], [83], [109], [131], [135], [138] 3. Maintain detailed SBOM [5], [8], [53], [183], [184] and perform SCA [8], [31], [43], [48], [51], [53], [55], [56] 4 Code signing [47], [83], [109], [135], [138], [141], [ Application Security Testing [34], [39], [41], [46], [55], [56], [58], [66], [80], [122], [134], [187] 4. Establish vetting process for Open-Source components hosted in internal/public repositories [15], [16], [32], [134], [188] 4. execution is achieved either at runtime, e.g., by embedding the payload in a specific function or initializer, or by poisoning test routines [19]. Differences also exist in regards to code obfuscation and malware detection. ...
... Specifically, APR human studies are not currently well motivated and justified. Only one paper explained why the method they used had been adopted and was appropriate: Cambronero et al. explained that their experiment had been chosen to 'model a scenario, inherent in the use of generate-andvalidate automatic patch generation, in which the developer is given patches that validate but may or may not be correct' [41]. The remaining studies gave no justification for their choice of method and why it might be well suited to their research aims and questions: in such cases, the method was just announced without any motivation. ...
... A different work reported concrete numbers on how the growth of AI is impacting the entire infrastructure of datacenters which need to grow in bandwidth, data storage, and power capacity [22]. While not focusing directly on AI sustainability, in other studies, researchers investigated the impact that utilizing smaller models [23] or down sampled datasets [24] can have on accuracy. Our study paves the way in directly addressing AI sustainability concerns by providing empirical evidence on how dataset modifications can be used to drastically save AI model training energy at a negligible accuracy loss. ...
... Some papers carefully ensure that verification results hold for a floating-point implementation of the network (Katz et al., 2017). A recent paper has shown that verified neural networks in LRA may not really be robust when one considers the bit-level behavior (Jia and Rinard, 2020b). A num-ber of papers have also considered bit-level verification of neural networks, using propositional logic instead of LRA (Jia and Rinard, 2020a;Narodytska et al., 2018). ...
... Insofar approaches on quality assurance of DNNs can be roughly classified into two (complementary) categories: testing (e.g., [4,6,20,32,38,41,45,54,69]) and formal verification (e.g. [15,18,21,23,24,26,28,33,35,39,40,49,50,59,60,67]). The purpose of testing is to disprove the robustness of DNNs by providing adversarial examples (i.e., counterexamples). ...
... We see our large-scale analysis on command-line user customizations manifested in alias definitions as a unique window of opportunity to study how the standard environment of the command line could be productively extended, modified, and improved. Our work goes hand in hand with existing efforts to innovate on the experience of command lines that employ techniques from research in systems (Raghavan et al. 2020;Handa et al. 2021), software engineering and programming languages Vasilakis et al. 2020;D'Antoni et al. 2017), human-computer interaction (Vaithilingam and Guo 2019;Gandhi and Gandhi 2020), and artificial intelligence (Agarwal et al. 2020;Lin et al. 2018;Hou et al. 2021). Particularly, our extensive qualitative and quantitative analysis, in conjunction with our dataset, form the basis for identifying opportunities for improving command-line experience in the following directions: by characterizing customization practices, we gain a categorical understanding underlying the needs and wants of command-line users; based on our analysis, we identify opportunities for innovation and formulate them as implications, accompanied with concrete scenarios and examples; further, our comprehensive dataset enables the foundation of learning approaches, as part of learning-based program synthesis (Bruch et al. 2009;Raychev et al. 2014), automated repair (Monperrus 2018), and recommendation systems (Mens and Lozano 2014); finally, we also see our results and datasets as a basis for usability research that can impact the design of tools and the future of the shell in general. ...