Shuvendu K. Lahiri's research while affiliated with Microsoft and other places

Publications (132)

Preprint
Full-text available
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the ability of LLMs to generate functionally correct code from natural language intent with respect to a set of hid...
Chapter
Asynchronous programming is widely adopted for building responsive and efficient software, and modern languages such as C# provide async/await primitives to simplify the use of asynchrony. In this paper, we propose an approach for refactoring a sequential program into an asynchronous program that uses async/await, called asynchronization. The refac...
Preprint
Full-text available
Asynchronous programming is widely adopted for building responsive and efficient software, and modern languages such as C# provide async/await primitives to simplify the use of asynchrony. In this paper, we propose an approach for refactoring a sequential program into an asynchronous program that uses async/await, called asynchronization. The refac...
Preprint
Full-text available
Pre-trained large language models (LLMs) such as OpenAI Codex have shown immense potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, the code produced does not have any correctness guarantees around satisfying user's intent. In fact, it is hard to define a notion of co...
Preprint
Full-text available
Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the p...
Article
As smart contracts gain adoption in financial transactions, it becomes increasingly important to ensure that they are free of bugs and security vulnerabilities. Of particular relevance in this context are arithmetic overflow bugs, as integers are often used to represent financial assets like account balances. Motivated by this observation, this pap...
Article
In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts require manual intervention and frequently stall modern continuous integration pipelines. Prior work found th...
Preprint
Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests in textual merge conflicts where the merge fails to produce code, or semantic merge conflicts where the merged code results in compiler or test...
Preprint
Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to unt...
Preprint
As smart contracts gain adoption in financial transactions, it becomes increasingly important to ensure that they are free of bugs and security vulnerabilities. Of particular relevance in this context are arithmetic overflow bugs, as integers are often used to represent financial assets like account balances. Motivated by this observation, this pap...
Preprint
Full-text available
Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as documentation, bug finding, and preventing regressions. In particular, unit tests document a unit's \textit{intended} functionality. A \textit{test oracle}, typically expressed as an condition, documents...
Preprint
Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours...
Preprint
Full-text available
Program merging is ubiquitous in modern software development. Although commonly used in most version control systems, text-based merge algorithms are prone to producing spurious merge conflicts: they report a conflict even when program changes do not interfere with each other semantically. Spurious merge conflicts are costly to development as the n...
Preprint
Full-text available
Forking structure is widespread in the open-source repositories and that causes a significant number of merge conflicts. In this paper, we study the problem of textual merge conflicts from the perspective of Microsoft Edge, a large, highly collaborative fork off the main Chromium branch with significant merge conflicts. Broadly, this study is divid...
Preprint
Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe f...
Chapter
Ensuring correctness of smart contracts is paramount to ensuring trust in blockchain-based systems. This paper studies the safety and security of smart contracts in the Azure Blockchain Workbench, an enterprise Blockchain-as-a-Service offering from Microsoft. In particular, we formalize semantic conformance of smart contracts against a state machin...
Preprint
With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and suppor...
Book
Full-text available
The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.* The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions....
Book
Full-text available
The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.* The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions....
Article
Full-text available
We present an approach for comparing two closely related concurrent programs, whose goal is to give feedback about interesting differences without relying on user-provided assertions. This approach compares two programs in terms of cross-thread interferences and data-flow, under a parametrized abstraction which can detect any difference in the limi...
Preprint
Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification infer...
Preprint
Full-text available
In this paper, we describe the formal verification of Smart Contracts offered as part of the Azure Blockchain Content and Samples on github. We describe two sources of formal verification problems: (i) semantic conformance checking of smart contracts against a state-machine and access control based Azure Blockchain Workbench application configurati...
Conference Paper
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
Article
Full-text available
Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of conflict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also sho...
Conference Paper
When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to identify the semantic difference is labor intensive and error prone, whereas techniques based on model checking are c...
Preprint
Validating wireless protocol implementations is challenging. Today's approaches require labor-intensive experimental setup and manual trace investigation, but produce poor coverage and inaccurate and irreproducible results. We present VERIFI, the first systematic sniffer-based, model-guided runtime verification framework for wireless protocol imple...
Article
Full-text available
Runtime validation of wireless protocol implementations cannot always employ direct instrumentation of the device under test (DUT). The DUT may not implement the required instrumentation, or the instrumentation may alter the DUT’s behavior when enabled. Wireless sniffers can monitor the DUT’s behavior without instrumentation, but they introduce new...
Preprint
When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to identify the semantic difference is labor intensive and error prone, whereas techniques based on model checking are c...
Article
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
Article
Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of confict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also show...
Article
Full-text available
Approximate computing is an emerging area for trading off the accuracy of an application for improved performance, lower energy costs, and tolerance to unreliable hardware. However, developers must ensure that the leveraged approximations do not introduce significant, intolerable divergence from the reference implementation, as specified by several...
Article
This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the sam...
Conference Paper
We present an approach for comparing two closely related concurrent programs, whose goal is to give feedback about interesting differences without relying on user-provided assertions. This approach compares two programs in terms of cross-thread interferences and data-flow, under a parametrized abstraction which can detect any difference in the limi...
Conference Paper
Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). However most statement-level CIA techniques suffer from imprecision as they do not incorpor...
Conference Paper
For most high level languages, two procedures are equivalent if they transform a pair of isomorphic stores to isomorphic stores. However, tools for modular checking of such equivalence impose a stronger check where isomorphism is strengthened to equality of stores. This results in the inability to prove many interesting program pairs with recursion...
Article
Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). Existing CIA is imprecise because it is coarse-grained, deals with only few refactoring pat...
Conference Paper
Runtime validation of wireless protocol implementations cannot always employ direct instrumentation of the device under test (DUT). The DUT may not implement the required instrumentation, or the instrumentation may alter the DUT’s behavior when enabled. Wireless sniffers can monitor the DUT’s behavior without instrumentation, but they introduce new...
Conference Paper
Approximate computing is an emerging area for trading off the accuracy of an application for improved performance, lower energy costs, and tolerance to unreliable hardware. However, developers must ensure that the leveraged approximations do not introduce significant, intolerable divergence from the reference implementation, as specified by several...
Conference Paper
Equivalence checking of imperative programs has several applications including compiler validation and cross-version verification. Debugging equivalence failures can be tedious for large examples, especially for low-level binary programs. In this paper, we formalize a simple yet precise notion of verifiable rootcause for equivalence failures that l...
Conference Paper
Full-text available
Verification of open programs can be challenging in the presence of an unconstrained environment. Verifying properties that depend on the environment yields a large class of uninteresting false alarms. Using a verifier on a program thus requires extensive initial investment in modeling the environment of the program. We propose a technique called a...
Patent
Full-text available
To overcome the difficulties inherent in traditional compiler validating methods, a new technique is herein provided for validating compiler output via program verification. In one embodiment, this technique is implemented as an automated tool that merges both a source program and the compiler-generated target program into a single (intermediate) p...
Patent
This document describes a unified type checker and property checker for a low level program's heap and its types. The type checker can use the full power of the property checker to express and verify subtle, program specific type and memory safety invariants well beyond what the native low level program system can check. Meanwhile, the property che...
Patent
Full-text available
The claimed subject matter provides a method for performing a static analysis of concurrent programs. The method includes determining that a static analysis of the first concurrent program generates a warning for an input. The method also includes determining whether a static analysis of the second concurrent program generates the warning for the i...
Article
We introduce Verification Modulo Versions (VMV), a new static analysis technique for reducing the number of alarms reported by static verifiers while providing sound semantic guarantees. First, VMV extracts semantic environment conditions from a base program P. Environmental conditions can either be sufficient conditions (implying the safety of P)...
Article
We introduce Verification Modulo Versions (VMV), a new static analysis technique for reducing the number of alarms reported by static verifiers while providing sound semantic guarantees. First, VMV extracts semantic environment conditions from a base program P. Environmental conditions can either be sufficient conditions (implying the safety of P)...
Patent
Concepts and technologies are described herein for incremental compositional dynamic test generation. The concepts and technologies described herein are used to increase the code coverage and security vulnerability identification abilities of testing applications and devices, without significantly increasing, and in some cases decreasing, computati...
Conference Paper
Previous version of a program can be a powerful enabler for program analysis by defining new relative specifications and making the results of current program analysis more relevant. In this paper, we describe the approach of differential assertion checking (DAC) for comparing different versions of a program with respect to a set of assertions. DAC...
Conference Paper
This paper describes a cross-version compiler validator and measures its effectiveness on the CLR JIT compiler. The validator checks for semantically equivalent assembly language output from various versions of the compiler, including versions across a seven-month time period, across two architectures (x86 and ARM), across two compilation scenarios...
Conference Paper
Modular assertion checkers are plagued with false alarms due to the need for precise environment specifications (preconditions and callee postconditions). Even the fully precise checkers report assertion failures under the most demonic environments allowed by unconstrained or partial specifications. The inability to preclude overly adversarial envi...
Conference Paper
Modular assertion checkers are plagued with false alarms due to the need for precise environment specifications (preconditions and callee postconditions). Even the fully precise checkers report assertion failures under the most demonic environments allowed by unconstrained or partial specifications. The inability to preclude overly adversarial envi...
Conference Paper
In this paper, we present a general framework for modularly comparing two (imperative) programs that can leverage single-program verifiers based on automated theorem provers. We formalize (i) mutual summaries for comparing the summaries of two programs, and (ii) relative termination to describe conditions under which two programs relatively termina...
Conference Paper
This paper describes our experience of performing reactive security audit of known security vulnerabilities in core operating system and browser COM components, using an extended static checker HAVOCLITE. We describe the extensions made to the tool to be applicable on such large C++ components, along with our experience of using an extended static...
Conference Paper
Full-text available
Consider a sequential programming language with control flow constructs such as assignments, choice, loops, and procedure calls. We restrict the syntax of expressions in this language to one that can be efficiently decided by a satisfiability-modulo-theories solver. For such a language, we define the problem of deciding whether a program can reach...
Conference Paper
In this paper, we describe SymDiff, a language-agnostic tool for equivalence checking and displaying semantic (behavioral) differences over imperative programs. The tool operates at the level of an intermediate verification language Boogie, for which translations exist from various source languages such as C, C# and x86. We discuss the tool and the...
Conference Paper
Full-text available
Static assertion checking of open programs requires setting up a precise harness to capture the environment assumptions. For instance, a library may require a file handle to be properly initialized before it is passed into it. A harness is used to set up or specify the appropriate preconditions before invoking methods from the program. In the absen...
Chapter
We review, compare and discuss several approaches for representing programs by logic formulas, such as symbolic model checking, bounded model checking, verification-condition generation, and symbolic-execution-based test generation.
Conference Paper
Dynamic test generation consists of running a programwhile simultaneously executing the program symbolically in order to gather constrains on inputs from conditional statements encountered along the execution. Those constraints are then systematically negated and solved with a constraint solver, generating new test inputs to exercise different exec...
Conference Paper
In this paper, we describe a few challenges that accompany SMT-based precise verification of systems code (device drivers, file systems) written in low-level languages such as C/C++. First, the presence of pointer arithmetic and untrusted casts make type checking difficult; we show how to formalize C type safety checking and exploit the types for d...
Conference Paper
Full-text available
Verification of large programs is impossible without proof techniques that allow local reasoning and information hiding. In this paper, we take the approach of modeling the heap as a collection of partial functions with disjoint domains. We call each such partial function a linear map. Programmers may select objects from linear maps, update linear...
Article
Full-text available
The problem of preserving unmodified facts across a state-ment is often referred to as the frame problem in program verification. The problem manifests most often while reasoning about procedure calls, because of scoping and modification of possibly unbounded set of heap locations. Existing approaches to deal with the frame problem are ei-ther too...
Conference Paper
Full-text available
Program verifiers based on first-order theorem provers model the program heap as a collection of mutable maps. In such verifiers, preserving unmodified facts about the heap across procedure calls is difficult because of scoping and modification of possibly unbounded set of heap locations. Existing approaches to deal with this problem are either too...
Conference Paper
Houdini is a simple yet scalable technique for annotation inference for modular contract checking. The input to Houdini is a set of candidate annotations, and the output is a consistent subset of these candidates. Since this technique is most useful as an annotation assistant for user-guided refinement of annotations, understanding the reason for t...
Conference Paper
It is widely believed that program analysis can be more closely targeted to the needs of programmers if the program is accompanied by further redundant documentation. This may include regression test suites, API protocol usage, and code contracts. To this should be added the largest and most redundant text of all: the previous version of the same p...
Conference Paper
Verification of large multithreaded programs is challenging. Automatic approaches cannot overcome the state explosion in the number of threads; semi-automatic methods require expensive human time for finding global inductive invariants. Ideally, automatic methods should not deal with the composition of the original threads and a human should not su...
Article
Full-text available
A frame rule enables local reasoning by providing a mechanism to preserve facts that are unchanged across a procedure call. Tradi-tionally, it has been difficult to exploit the frame rule in classical first-order assertion logics that model the program heap as an unin-terpreted map from addresses to values. Therefore, although clas-sical logics all...
Article
A typical software module evolves through many versions over the course of its development. To maintain compatibility with mod-ule clients, it is crucial that a module's behavior at its interface does not change in an undesirable manner across versions. The problem of introducing changes which break interface behavior remains one of the most daunti...
Conference Paper
Full-text available
Theorem-prover based modular checkers have the potential to perform scalable and precise checking of user-defined properties by combining pathsensitive intraprocedural reasoning with user-defined procedure abstractions. However, such tools have seldom been deployed on large software applications of industrial relevance due to the annotation burden...
Conference Paper
Full-text available
In this paper, we investigate the asymptotic complexity of various predicate abstraction problems relative to the asymptotic com- plexity of checking an annotated program in a given assertion logic. Un- like previous approaches, we pose the predicate abstraction problem as a decision problem, instead of the traditional inference problem. For as- se...
Conference Paper
Full-text available
Contract-based property checkers hold the potential for pre- cise, scalable, and incremental reasoning. However, it is dicult to apply such checkers to large program modules because they require program- mers to provide detailed contracts, including an interface specication, module invariants, and internal specications. We argue that given a suitab...
Conference Paper
Full-text available
Context-bounded analysis is an attractive approach to verification of concurrent programs. Bounding the number of contexts executed per thread not only reduces the asymptotic complexity, but also the complexity increases gradually from checking a purely sequential program. Lal and Reps[14] provided a method for reducing the context-bounded verific...
Conference Paper
Full-text available
We present a unified approach to type checking and property checking for low-level code. Type checking for low-level code is challenging because type safety often depends on complex, program-specific invariants that are difficult for traditional type checkers to express. Conversely, property checking for low-level code is challenging because it is...
Article
Full-text available
Reasoning about program heap, especially if it involves handling unbounded, dynamically heap-allocated data structures such as linked lists and arrays, is challenging. Furthermore, sound analysis that precisely models heap becomes significantly more challenging in the presence of low-level pointer manipulation that is prevalent in systems software....
Conference Paper
We present a case study in which a team of test engineers at Microsoft applied a feedback-directed random testing tool to a critical component of the .NET architecture. Due to its complexity and high reliability requirements, the compo- nent had already been tested by 40 test engineers over five years, using manual testing and many automated testin...
Article
This paper takes a fresh look at the problem of precise verification of heap-manipulating programs using first-order Satisfiability-Modulo-Theories (SMT) solvers. We augment the specification logic of such solvers by introducing the Logic of Interpreted Sets and Bounded Quantification for specifying properties of heap-manipulating programs. Our log...
Conference Paper
Full-text available
This paper takes a fresh look at the problem of precise verifica- tion of heap-manipulating programs using first-order Satis fiability- Modulo-Theories (SMT) solvers. We augment the specificatio n logic of such solvers by introducing the Logic of Interpreted Sets and Bounded Quantification for specifying properties of heap- manipulating programs. O...
Conference Paper
We present a technique that improves random test generation by incorporating feedback obtained from executing test inputs as they are created. Our technique builds inputs incrementally by randomly selecting a method call to apply and finding arguments from among previously-constructed inputs. As soon as an input is built, it is executed and checked...
Article
Full-text available
We present a new approach for performing predicate abstraction based on symbolic decision procedures. Intuitively, a symbolic decision procedure for a theory takes a set of predicates in the theory and symbolically executes a decision procedure on all the subsets over the set of predicates. The result of the symbolic decision procedure is a shared...
Conference Paper
Full-text available
Reasoning about heap-allocated data structures such as linked lists and arrays is challenging. The reachability predicate has proved to be useful for reasoning about the heap in type-safe languages where memory is manipulated by dereferencing object fie