Shuvendu K. Lahiri's research while affiliated with Microsoft and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (132)
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the ability of LLMs to generate functionally correct code from natural language intent with respect to a set of hid...
Asynchronous programming is widely adopted for building responsive and efficient software, and modern languages such as C# provide async/await primitives to simplify the use of asynchrony. In this paper, we propose an approach for refactoring a sequential program into an asynchronous program that uses async/await, called asynchronization. The refac...
Asynchronous programming is widely adopted for building responsive and efficient software, and modern languages such as C# provide async/await primitives to simplify the use of asynchrony. In this paper, we propose an approach for refactoring a sequential program into an asynchronous program that uses async/await, called asynchronization. The refac...
Pre-trained large language models (LLMs) such as OpenAI Codex have shown immense potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, the code produced does not have any correctness guarantees around satisfying user's intent. In fact, it is hard to define a notion of co...
Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the p...
As smart contracts gain adoption in financial transactions, it becomes increasingly important to ensure that they are free of bugs and security vulnerabilities. Of particular relevance in this context are arithmetic overflow bugs, as integers are often used to represent financial assets like account balances. Motivated by this observation, this pap...
In collaborative software development, program merging is
the
mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts require manual intervention and frequently stall modern continuous integration pipelines. Prior work found th...
Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests in textual merge conflicts where the merge fails to produce code, or semantic merge conflicts where the merged code results in compiler or test...
Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to unt...
As smart contracts gain adoption in financial transactions, it becomes increasingly important to ensure that they are free of bugs and security vulnerabilities. Of particular relevance in this context are arithmetic overflow bugs, as integers are often used to represent financial assets like account balances. Motivated by this observation, this pap...
Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as documentation, bug finding, and preventing regressions. In particular, unit tests document a unit's \textit{intended} functionality. A \textit{test oracle}, typically expressed as an condition, documents...
Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours...
Program merging is ubiquitous in modern software development. Although commonly used in most version control systems, text-based merge algorithms are prone to producing spurious merge conflicts: they report a conflict even when program changes do not interfere with each other semantically. Spurious merge conflicts are costly to development as the n...
Forking structure is widespread in the open-source repositories and that causes a significant number of merge conflicts. In this paper, we study the problem of textual merge conflicts from the perspective of Microsoft Edge, a large, highly collaborative fork off the main Chromium branch with significant merge conflicts. Broadly, this study is divid...
Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe f...
Ensuring correctness of smart contracts is paramount to ensuring trust in blockchain-based systems. This paper studies the safety and security of smart contracts in the Azure Blockchain Workbench, an enterprise Blockchain-as-a-Service offering from Microsoft. In particular, we formalize semantic conformance of smart contracts against a state machin...
With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and suppor...
The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.*
The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions....
The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.*
The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions....
We present an approach for comparing two closely related concurrent programs, whose goal is to give feedback about interesting differences without relying on user-provided assertions. This approach compares two programs in terms of cross-thread interferences and data-flow, under a parametrized abstraction which can detect any difference in the limi...
Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification infer...
In this paper, we describe the formal verification of Smart Contracts offered as part of the Azure Blockchain Content and Samples on github. We describe two sources of formal verification problems: (i) semantic conformance checking of smart contracts against a state-machine and access control based Azure Blockchain Workbench application configurati...
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of conflict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also sho...
When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to identify the semantic difference is labor intensive and error prone, whereas techniques based on model checking are c...
Validating wireless protocol implementations is challenging. Today's approaches require labor-intensive experimental setup and manual trace investigation, but produce poor coverage and inaccurate and irreproducible results. We present VERIFI, the first systematic sniffer-based, model-guided runtime verification framework for wireless protocol imple...
Runtime validation of wireless protocol implementations cannot always employ direct instrumentation of the device under test (DUT). The DUT may not implement the required instrumentation, or the instrumentation may alter the DUT’s behavior when enabled. Wireless sniffers can monitor the DUT’s behavior without instrumentation, but they introduce new...
When an evolving program is modified to address issues related to thread synchronization, there is a need to confirm the change is correct, i.e., it does not introduce unexpected behavior. However, manually comparing two programs to identify the semantic difference is labor intensive and error prone, whereas techniques based on model checking are c...
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of confict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also show...
Approximate computing is an emerging area for trading off the accuracy of an application for improved performance, lower energy costs, and tolerance to unreliable hardware. However, developers must ensure that the leveraged approximations do not introduce significant, intolerable divergence from the reference implementation, as specified by several...
This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the sam...
We present an approach for comparing two closely related concurrent programs, whose goal is to give feedback about interesting differences without relying on user-provided assertions. This approach compares two programs in terms of cross-thread interferences and data-flow, under a parametrized abstraction which can detect any difference in the limi...
Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). However most statement-level CIA techniques suffer from imprecision as they do not incorpor...
For most high level languages, two procedures are equivalent if they transform a pair of isomorphic stores to isomorphic stores. However, tools for modular checking of such equivalence impose a stronger check where isomorphism is strengthened to equality of stores. This results in the inability to prove many interesting program pairs with recursion...
Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). Existing CIA is imprecise because it is coarse-grained, deals with only few refactoring pat...
Runtime validation of wireless protocol implementations cannot always employ direct instrumentation of the device under test (DUT). The DUT may not implement the required instrumentation, or the instrumentation may alter the DUT’s behavior when enabled. Wireless sniffers can monitor the DUT’s behavior without instrumentation, but they introduce new...
Approximate computing is an emerging area for trading off the accuracy of an application for improved performance, lower energy costs, and tolerance to unreliable hardware. However, developers must ensure that the leveraged approximations do not introduce significant, intolerable divergence from the reference implementation, as specified by several...
Equivalence checking of imperative programs has several applications including compiler validation and cross-version verification. Debugging equivalence failures can be tedious for large examples, especially for low-level binary programs. In this paper, we formalize a simple yet precise notion of verifiable rootcause for equivalence failures that l...
Verification of open programs can be challenging in the presence of an unconstrained environment. Verifying properties that depend on the environment yields a large class of uninteresting false alarms. Using a verifier on a program thus requires extensive initial investment in modeling the environment of the program. We propose a technique called a...
To overcome the difficulties inherent in traditional compiler validating methods, a new technique is herein provided for validating compiler output via program verification. In one embodiment, this technique is implemented as an automated tool that merges both a source program and the compiler-generated target program into a single (intermediate) p...
This document describes a unified type checker and property checker for a low level program's heap and its types. The type checker can use the full power of the property checker to express and verify subtle, program specific type and memory safety invariants well beyond what the native low level program system can check. Meanwhile, the property che...
The claimed subject matter provides a method for performing a static analysis of concurrent programs. The method includes determining that a static analysis of the first concurrent program generates a warning for an input. The method also includes determining whether a static analysis of the second concurrent program generates the warning for the i...
We introduce Verification Modulo Versions (VMV), a new static analysis technique for reducing the number of alarms reported by static verifiers while providing sound semantic guarantees. First, VMV extracts semantic environment conditions from a base program P. Environmental conditions can either be sufficient conditions (implying the safety of P)...
We introduce Verification Modulo Versions (VMV), a new static analysis technique for reducing the number of alarms reported by static verifiers while providing sound semantic guarantees. First, VMV extracts semantic environment conditions from a base program P. Environmental conditions can either be sufficient conditions (implying the safety of P)...
Concepts and technologies are described herein for incremental compositional dynamic test generation. The concepts and technologies described herein are used to increase the code coverage and security vulnerability identification abilities of testing applications and devices, without significantly increasing, and in some cases decreasing, computati...
Previous version of a program can be a powerful enabler for program analysis by defining new relative specifications and making the results of current program analysis more relevant. In this paper, we describe the approach of differential assertion checking (DAC) for comparing different versions of a program with respect to a set of assertions. DAC...
This paper describes a cross-version compiler validator and measures its effectiveness on the CLR JIT compiler. The validator checks for semantically equivalent assembly language output from various versions of the compiler, including versions across a seven-month time period, across two architectures (x86 and ARM), across two compilation scenarios...
Modular assertion checkers are plagued with false alarms due to the need for precise environment specifications (preconditions and callee postconditions). Even the fully precise checkers report assertion failures under the most demonic environments allowed by unconstrained or partial specifications. The inability to preclude overly adversarial envi...
Modular assertion checkers are plagued with false alarms due to the need for precise environment specifications (preconditions and callee postconditions). Even the fully precise checkers report assertion failures under the most demonic environments allowed by unconstrained or partial specifications. The inability to preclude overly adversarial envi...
In this paper, we present a general framework for modularly comparing two (imperative) programs that can leverage single-program verifiers based on automated theorem provers. We formalize (i) mutual summaries for comparing the summaries of two programs, and (ii) relative termination to describe conditions under which two programs relatively termina...
This paper describes our experience of performing reactive security audit of known security vulnerabilities in core operating system and browser COM components, using an extended static checker HAVOCLITE. We describe the extensions made to the tool to be applicable on such large C++ components, along with our experience of using an extended static...
Consider a sequential programming language with control flow constructs such as assignments, choice, loops, and procedure calls. We restrict the syntax of expressions in this language to one that can be efficiently decided by a satisfiability-modulo-theories solver. For such a language, we define the problem of deciding whether a program can reach...
In this paper, we describe SymDiff, a language-agnostic tool for equivalence checking and displaying semantic (behavioral) differences over imperative programs. The tool operates at the level of an intermediate verification language Boogie, for which translations exist from various source languages such as C, C# and x86. We discuss the tool and the...
Static assertion checking of open programs requires setting up a precise harness to capture the environment assumptions. For instance, a library may require a file handle to be properly initialized before it is passed into it. A harness is used to set up or specify the appropriate preconditions before invoking methods from the program. In the absen...
We review, compare and discuss several approaches for representing programs by logic formulas, such as symbolic model checking, bounded model checking, verification-condition generation, and symbolic-execution-based test generation.
Dynamic test generation consists of running a programwhile simultaneously executing the program symbolically in order to gather constrains on inputs from conditional statements encountered along the execution. Those constraints are then systematically negated and solved with a constraint solver, generating new test inputs to exercise different exec...
In this paper, we describe a few challenges that accompany SMT-based precise verification of systems code (device drivers,
file systems) written in low-level languages such as C/C++. First, the presence of pointer arithmetic and untrusted casts
make type checking difficult; we show how to formalize C type safety checking and exploit the types for d...
Verification of large programs is impossible without proof techniques that allow local reasoning and information hiding. In this paper, we take the approach of modeling the heap as a collection of partial functions with disjoint domains. We call each such partial function a linear map. Programmers may select objects from linear maps, update linear...
The problem of preserving unmodified facts across a state-ment is often referred to as the frame problem in program verification. The problem manifests most often while reasoning about procedure calls, because of scoping and modification of possibly unbounded set of heap locations. Existing approaches to deal with the frame problem are ei-ther too...
Program verifiers based on first-order theorem provers model the program heap as a collection of mutable maps. In such verifiers,
preserving unmodified facts about the heap across procedure calls is difficult because of scoping and modification of possibly
unbounded set of heap locations. Existing approaches to deal with this problem are either too...
Houdini is a simple yet scalable technique for annotation inference for modular contract checking. The input to Houdini is a set of candidate annotations, and the output is a consistent subset of these candidates. Since this technique is most useful as an annotation assistant for user-guided refinement of annotations, understanding the reason for t...
It is widely believed that program analysis can be more closely targeted to the needs of programmers if the program is accompanied by further redundant documentation. This may include regression test suites, API protocol usage, and code contracts. To this should be added the largest and most redundant text of all: the previous version of the same p...
Verification of large multithreaded programs is challenging. Automatic approaches cannot overcome the state explosion in the number of threads; semi-automatic methods require expensive human time for finding global inductive invariants. Ideally, automatic methods should not deal with the composition of the original threads and a human should not su...
A frame rule enables local reasoning by providing a mechanism to preserve facts that are unchanged across a procedure call. Tradi-tionally, it has been difficult to exploit the frame rule in classical first-order assertion logics that model the program heap as an unin-terpreted map from addresses to values. Therefore, although clas-sical logics all...
A typical software module evolves through many versions over the course of its development. To maintain compatibility with mod-ule clients, it is crucial that a module's behavior at its interface does not change in an undesirable manner across versions. The problem of introducing changes which break interface behavior remains one of the most daunti...
Theorem-prover based modular checkers have the potential to perform scalable and precise checking of user-defined properties by combining pathsensitive intraprocedural reasoning with user-defined procedure abstractions. However, such tools have seldom been deployed on large software applications of industrial relevance due to the annotation burden...
In this paper, we investigate the asymptotic complexity of various predicate abstraction problems relative to the asymptotic com- plexity of checking an annotated program in a given assertion logic. Un- like previous approaches, we pose the predicate abstraction problem as a decision problem, instead of the traditional inference problem. For as- se...
Contract-based property checkers hold the potential for pre- cise, scalable, and incremental reasoning. However, it is dicult to apply such checkers to large program modules because they require program- mers to provide detailed contracts, including an interface specication, module invariants, and internal specications. We argue that given a suitab...
Context-bounded analysis is an attractive approach to verification of concurrent programs. Bounding the number of contexts
executed per thread not only reduces the asymptotic complexity, but also the complexity increases gradually from checking
a purely sequential program.
Lal and Reps[14] provided a method for reducing the context-bounded verific...
We present a unified approach to type checking and property checking for low-level code. Type checking for low-level code is challenging because type safety often depends on complex, program-specific invariants that are difficult for traditional type checkers to express. Conversely, property checking for low-level code is challenging because it is...
Reasoning about program heap, especially if it involves handling unbounded, dynamically heap-allocated data structures such
as linked lists and arrays, is challenging. Furthermore, sound analysis that precisely models heap becomes significantly more
challenging in the presence of low-level pointer manipulation that is prevalent in systems software....
We present a case study in which a team of test engineers at Microsoft applied a feedback-directed random testing tool to a critical component of the .NET architecture. Due to its complexity and high reliability requirements, the compo- nent had already been tested by 40 test engineers over five years, using manual testing and many automated testin...
This paper takes a fresh look at the problem of precise verification of heap-manipulating programs using first-order Satisfiability-Modulo-Theories (SMT) solvers. We augment the specification logic of such solvers by introducing the Logic of Interpreted Sets and Bounded Quantification for specifying properties of heap-manipulating programs. Our log...
This paper takes a fresh look at the problem of precise verifica- tion of heap-manipulating programs using first-order Satis fiability- Modulo-Theories (SMT) solvers. We augment the specificatio n logic of such solvers by introducing the Logic of Interpreted Sets and Bounded Quantification for specifying properties of heap- manipulating programs. O...
We present a technique that improves random test generation by incorporating feedback obtained from executing test inputs as they are created. Our technique builds inputs incrementally by randomly selecting a method call to apply and finding arguments from among previously-constructed inputs. As soon as an input is built, it is executed and checked...
We present a new approach for performing predicate abstraction based on
symbolic decision procedures. Intuitively, a symbolic decision procedure for a
theory takes a set of predicates in the theory and symbolically executes a
decision procedure on all the subsets over the set of predicates. The result of
the symbolic decision procedure is a shared...
Reasoning about heap-allocated data structures such as linked lists and arrays is challenging. The reachability predicate has proved to be useful for reasoning about the heap in type-safe languages where memory is manipulated by dereferencing object fie