Conference Paper

SMACK: Decoupling Source Language Details from Verifier Implementations

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A major obstacle to putting software verification research into practice is the high cost of developing the infrastructure enabling the application of verification algorithms to actual production code, in all of its complexity. Handling an entire programming language is a huge endeavor that few researchers are willing to undertake; even fewer could invest the effort to implement a verification algorithm for many source languages. To decouple the implementations of verification algorithms from the details of source languages, and enable rapid prototyping on production code, we have developed SMACK. At its core, SMACK is a translator from the LLVM intermediate representation (IR) into the Boogie intermediate verification language (IVL). Sourcing LLVM exploits an increasing number of compiler front ends, optimizations, and analyses. Targeting Boogie exploits a canonical platform which simplifies the implementation of algorithms for verification, model checking, and abstract interpretation. Our initial experience in verifying C-language programs is encouraging: SMACK is competitive in SV-COMP benchmarks, is able to translate large programs (100 KLOC), and is being used in several verification research prototypes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Hence, we automatically generate the first-order logic formula (in SMT-LIB format) directly from the verifier's C source code. Modeling C code in general is hard [42,46,64]. However, we observe that it is sufficient to handle a subset of C for the verifier's value-tracking routines. ...
... C to First-order Logic. Similar to our approach that generates first-orderlogic formulas from C code, prior tools also generate verification conditions from C code [42,46,54,64]. A few of them, SMACK [64] and SeaHorn [42], use LLVM IR for this purpose. ...
... Similar to our approach that generates first-orderlogic formulas from C code, prior tools also generate verification conditions from C code [42,46,54,64]. A few of them, SMACK [64] and SeaHorn [42], use LLVM IR for this purpose. These tools support a rich subset of C. They typically model memory as a linear array of bytes, which is not ideal for modeling kernel source code. ...
Chapter
Full-text available
This paper proposes an automated method to check the correctness of range analysis used in the Linux kernel ’s eBPF verifier. We provide the specification of soundness for range analysis performed by the eBPF verifier. We automatically generate verification conditions that encode the operation of the eBPF verifier directly from the Linux kernel ’s C source code and check it against our specification. When we discover instances where the eBPF verifier is unsound, we propose a method to generate an eBPF program that demonstrates the mismatch between the abstract and the concrete semantics. Our prototype automatically checks the soundness of 16 versions of the eBPF verifier in the Linux kernel versions ranging from 4.14 to 5.19. In this process, we have discovered new bugs in older versions and proved the soundness of range analysis in the latest version of the Linux kernel.
... SeaHorn [38] uses Clang as its frontend and generates verification conditions from Llvm-IR. Smack [67] translates Llvm-IR to Boogie [68], an intermediate verification language, and relies on verifiers for Boogie as backend. MLIR [20] is an extension of Llvm to flexibly define "dialects" of Llvm-IR and automatically generate translators between the dialects. ...
... Similarly, Btor2MLIR [28] defines a dialect for Btor2 in the MLIR [20] framework and translates a Btor2 model into Llvm-IR. Software analyzers consuming Llvm-IR, such as SeaHorn [38], Smack [67], and Klee [64], can then be used to examine properties of Btor2 circuits. ...
Chapter
Full-text available
Transformation plays a key role in verification technology, conveying information across different abstraction layers and underpinning the correctness, efficiency, and usability of formal-methods tools. Nevertheless, transformation procedures are often tightly coupled with individual verifiers, and thus, hard to reuse across different tools. The lack of modularity incurs repeated engineering effort and the risk of bugs in the process of ‘reinventing the wheel’. It can be seen as a new paradigm to construct verification technology by employing standardized formats and interfaces for information exchange, and by building modular transformers between verification artifacts. Following this paradigm of modular transformation , recent works have (1) enhanced and complemented the state of the art by transforming verification tasks and applying tools for other modeling languages or specifications, (2) built new tools by combining mature ones via standardized formats for exchanging verification artifacts, and (3) communicated certificates of verification results to improve usability and explainability. In this paper, we survey existing transformation procedures and advocate the paradigm of modular transformation and exchange formats. Our vision is an ecosystem of reusable verification components that supports joining forces of all available techniques, allows agile development of new tools, and provides a common ground to evaluate and compare future scientific advancements: via modular transformation.
... BMC (Bounded Model Checking) [15,16,25] is a widely used technique for verifying memory-safety properties in unsafe Rust. It encodes program traces as symbolic SAT/SMT problems and employs solvers to provide bounded proofs. ...
... ing [14,20,21,30] and deductive proving [2,8], are used to verify functional properties in safe Rust. Techniques like abstract interpretation [10,22], symbolic execution [19,26,28], and bounded model checking (BMC) [15,16,25] are focused on ensuring memory safety properties. BMC stands out by encoding program traces into symbolic SAT/SMT problems for solver-based automatic verification. ...
Chapter
Full-text available
Rust has gained popularity as a safer alternative to C/C++ for low-level programming due to its memory-safety features and minimal runtime overhead. However, the use of the “unsafe” keyword allows developers to bypass safety guarantees, posing memory-safety risks. Bounded Model Checking (BMC) is commonly used to detect memory-safety problems, but it has limitations for large-scale programs, as it can only detect bugs within a bounded number of executions. In this paper, we introduce UnsafeCop that utilizes and enhances BMC for analyzing memory safety in real-world unsafe Rust code. Our methodology incorporates harness design, loop bound inference, and both loop and function stubbing for comprehensive analysis. We optimize verification efficiency through a strategic function verification order, leveraging both types of stubbing. We conducted a case study on TECC (Trusted-Environment-based Cryptographic Computing), a proprietary framework consisting of 30,174 lines of Rust code, including 3,019 lines of unsafe Rust code, developed by Ant Group. Experimental results demonstrate that UnsafeCop effectively detects and verifies dozens of memory safety issues, reducing verification time by 73.71% compared to the traditional non-stubbing approach, highlighting its practical effectiveness.
... In our case study, we used the BMC engines of Sea-Horn [13] and SMACK [1], and the symbolic execution tool KLEE [14]. We have chosen SeaHorn and SMACK because they are conceptually similar to CBMC that was used in Chong et al. ...
... Lastly, we used small memory allocations (up to 4 KiB) since using larger sizes leads to an exponential number of program paths to explore. SMACK [1], similarly to SeaHorn, is an LLVM based BMC verification tool that uses the Corral [4] verifier as its backend. It can be applied to validate program assertions to a given loop and recursion bound. ...
Article
Full-text available
A recent case study from AWS by Chong et al. proposes an effective methodology for Bounded Model Checking in industry. In this paper, we report on a follow-up case study that explores the methodology from the perspective of three research questions: (a) can proof artefacts be used across verification tools; (b) are there bugs in verified code; and (c) can specifications be improved. To study these questions, we port the verification tasks for aws-c-common library to SeaHorn, SMACK and KLEE. We show the benefits of using compiler semantics and cross-checking specifications with different verification techniques, and call for standardizing proof library extensions to increase specification reuse. The verification tasks discussed are publicly available online.
... If the conjunction of the expressions above is satisfiable, the unsafe state is reachable, and therefore, the concurrent program is unsafe. Dartagnan is a software verification tool, complete with an integration to Smack [29], an LLVM-based program transformation tool that allows Dartagnan to work on formal models rather than source-level programs. The gap between the higher-level LLVM-IR and the ISA of the target architecture is bridged by using compiler mappings for translating e.g. ...
Article
Full-text available
Communication models are a key aspect in the design and implementation of distributed system architectures. Application logic must consider the guarantees of these models, which fundamentally influence its correctness. Modern multi-core processor architectures face a similar problem when it comes to accessing shared memory: the guarantees of an architecture have a fundamental impact on the observable behavior of software. The formalization of these guarantees in a declarative way has led to powerful tools and algorithms to define reusable constraints on patterns of memory access events and their relationships, enabling the efficient description and automatic formal analysis of software properties with respect to a specific architecture. The Cat memory modeling language provides a standard means of specifying these constraints. Despite the parallels, the axiomatic modeling and analysis of communication models in distributed systems remain a relatively unexplored area. In this paper, we address this gap and demonstrate how communication models can be mapped to the Cat language. We create a standard library of reusable patterns and demonstrate our approach, called NetworCat, on the simple examples of UDP and TCP, and we also present its applicability to the vastly configurable OMG-DDS service. This adaptation-based approach enables the use of ever-improving verification tools built for shared memory concurrency on distributed systems. We believe this not only benefits distributed system analyses by broadening the toolset for verification but also positively impacts the field of memory-model-aware verification by widening its audience to another domain.
... Within the framework of Hoare logic (Hoare, 1969), loop invariants, which are usually assertions that are always true during the entry and execution of the loop, serve as the abstractions of loop properties. However, in the practical implementation of program verification tools such as CPAchecker (Beyer and Keremoglu, 2011), CBMC (Kroening and Tautschnig, 2014), SMACK (Rakamarić and Emmi, 2014) and Frama-C (Kirchner et al., 2015), the derivation of loop invariants often necessitates manual intervention by domain experts, which poses a substantial impediment to the full automation of program verification. ...
Preprint
Full-text available
Automated program verification has always been an important component of building trustworthy software. While the analysis of real-world programs remains a theoretical challenge, the automation of loop invariant analysis has effectively resolved the problem. However, real-world programs that often mix complex data structures and control flows pose challenges to traditional loop invariant generation tools. To enhance the applicability of invariant generation techniques, we proposed ACInv, an Automated Complex program loop Invariant generation tool, which combines static analysis with Large Language Models (LLMs) to generate the proper loop invariants. We utilize static analysis to extract the necessary information for each loop and embed it into prompts for the LLM to generate invariants for each loop. Subsequently, we employ an LLM-based evaluator to assess the generated invariants, refining them by either strengthening, weakening, or rejecting them based on their correctness, ultimately obtaining enhanced invariants. We conducted experiments on ACInv, which showed that ACInv outperformed previous tools on data sets with data structures, and maintained similar performance to the state-of-the-art tool AutoSpec on numerical programs without data structures. For the total data set, ACInv can solve 21% more examples than AutoSpec and can generate reference data structure templates.
... e interleaving problem can also be addressed by translating the concurrent programs into sequential programs. Tools implementing this technique include MU CS [38], L CS A [25], and SMACK [34], etc. However, all of these work gives an exact encoding of the scheduling constraint, while we ignore this constraint and employ a scheduling constraint based abstraction re nement method to obtain a small yet e ective abstraction w.r.t. the property. ...
Preprint
Bounded model checking is among the most efficient techniques for the automatic verification of concurrent programs. However, encoding all possible interleavings often requires a huge and complex formula, which significantly limits the salability. This paper proposes a novel and efficient abstraction refinement method for multi-threaded program verification. Observing that the huge formula is usually dominated by the exact encoding of the scheduling constraint, this paper proposes a \tsc based abstraction refinement method, which avoids the huge and complex encoding of BMC. In addition, to obtain an effective refinement, we have devised two graph-based algorithms over event order graph for counterexample validation and refinement generation, which can always obtain a small yet effective refinement constraint. Enhanced by two constraint-based algorithms for counterexample validation and refinement generation, we have proved that our method is sound and complete w.r.t. the given loop unwinding depth. Experimental results on \svcompc benchmarks indicate that our method is promising and significantly outperforms the existing state-of-the-art tools.
... There exist several projects that develop formal static analysis techniques for bug finding in LLVM IR. SMACK [32] defines a translation of LLVM IR into Boo-giePL [20], to reason about C-programs using assertions that are compiled into LLVM IR using Clang. The verification itself is bounded and a potential extension to contract specifications has not yet been explored. ...
Chapter
Full-text available
Over the last years, deductive program verifiers have substantially improved, and their applicability on non-trivial applications has been demonstrated. However, a major bottleneck is that for every new programming language, a new deductive verifier has to be built. This paper describes the first steps in a project that aims to address this problem, by language-agnostic support for deductive verification: Rather than building a deductive program verifier for every programming language, we develop deductive program verification technology for a widely-used intermediate representation language (LLVM IR), such that we eventually get verification support for any language that can be compiled into the LLVM IR format. Concretely, this paper describes the design of VCLLVM, a prototype tool that adds LLVM IR as a supported language to the VerCors verifier. We discuss the challenges that have to be addressed to develop verification support for such a low-level language. Moreover, we also sketch how we envisage to build verification support for any specified source program that can be compiled into LLVM IR on top of VCLLVM.
... Note that the framework in Fig. 1b is not limited to Btor2C and verifiers for C programs. For example, one could also materialize the concept with the translator Btor2MLIR [6], analyzers for LLVM-bytecode programs [11], such as Klee [16], Smack [17], and SeaHorn [18], and a corresponding LLVM-to-Btor2 witness translator. There also exist translators [19,20,21] from Verilog [22] circuits to C programs or SMV [23] models. ...
Chapter
Full-text available
Formal verification is essential but challenging: Even the best verifiers may produce wrong verification verdicts. Certifying verifiers enhance the confidence in verification results by generating a witness for other tools to validate the verdict independently. Recently, translating the hardware-modeling language Btor2 to software, such as the programming language C or LLVM intermediate representation, has been actively studied and facilitated verifying hardware designs by software analyzers. However, it remained unknown whether witnesses produced by software verifiers contain helpful information about the original circuits and how such information can aid hardware analysis. We propose a certifying and validating framework Btor2 - Cert to verify safety properties of Btor2 circuits, combining Btor2 -to-C translation, software verifiers, and a new witness validator Btor2 - Val , to answer the above open questions. Btor2 - Cert translates a software violation witness to a Btor2 violation witness; As the Btor2 language lacks a format for correctness witnesses , we encode invariants in software correctness witnesses as Btor2 circuits. The validator Btor2 - Val checks violation witnesses by circuit simulation and correctness witnesses by validation via verification . In our evaluation, Btor2 - Cert successfully utilized software witnesses to improve quality assurance of hardware. By invoking the software verifier Cbmc on translated programs, it uniquely solved, with confirmed witnesses, 8 % of the unsafe tasks for which the hardware verifier ABC failed to detect bugs.
... We also use Linux device drivers benchmarks available as part of the SV-COMP [6] benchmark suite. We use SMACK [27] to compile these drivers to Boogie and instrument the properties to be verified. ...
Chapter
Full-text available
We propose a novel lazy bounded model checking (BMC) algorithm, Trace Inlining , that identifies relevant behaviors of the program to compute partial proofs as procedural summaries. Whenever procedures are reused in other contexts, Trace Inlining attempts to construct safety proofs using these summaries. If the current summaries are sufficient to complete the proof, it gains both in solving times and smaller encodings. If the summaries are found to be insufficient, they are automatically refined for future use. The partial proofs are enabled by a sequence of alternating underapproximation and overapproximation rounds until the program verification condition is found to be unsatisfiable. We evaluate our Trace Inlining algorithm on real-world benchmarks consisting of Windows and Linux device drivers. Our results show that the proposed algorithm is able to solve 12% additional benchmarks that were unsolved by state-of-the-art lazy BMC solvers Corral and Legion . Further, Trace Inlining is 6 ×\times × faster than Corral and 3 ×\times × faster than Legion in terms of verification time. The virtual best of all three verifiers is 4 ×\times × faster than the virtual best of Corral and Legion , implying that our technique significantly improves on what is possible today.
... For a new Rust library that uses unsafe Rust code, RustBelt can tell the verification conditions that the library should meet to be considered as a safe extension to the Rust language. Baranowski et al. [6] extend the SMACK verifier [41], a software verification toolchain, to enable its usage on Rust programs. In addition, Astrauskas et al. [3] propose a verification technique that utilizes the type system of Rust to simplify the specification and verification of Rust programs. ...
Article
Full-text available
Rust is an emerging programming language designed for the development of systems software. To facilitate the reuse of Rust code, crates.io , as a central package registry of the Rust ecosystem, hosts thousands of third-party Rust packages. The openness of crates.io enables the growth of the Rust ecosystem but comes with security risks by severe security advisories. Although Rust guarantees a software program to be safe via programming language features and strict compile-time checking, the unsafe keyword in Rust allows developers to bypass compiler safety checks for certain regions of code. Prior studies empirically investigate the memory safety and concurrency bugs in the Rust ecosystem, as well as the usage of unsafe keywords in practice. Nonetheless, the literature lacks a systematic investigation of the security risks in the Rust ecosystem. In this paper, we perform a comprehensive investigation into the security risks present in the Rust ecosystem, asking “what are the characteristics of the vulnerabilities, what are the characteristics of the vulnerable packages, and how are the vulnerabilities fixed in practice?”. To facilitate the study, we first compile a dataset of 433 vulnerabilities, 300 vulnerable code repositories, and 218 vulnerability fix commits in the Rust ecosystem, spanning over 7 years. With the dataset, we characterize the types, life spans, and evolution of the disclosed vulnerabilities. We then characterize the popularity, categorization, and vulnerability density of the vulnerable Rust packages, as well as their versions and code regions affected by the disclosed vulnerabilities. Finally, we characterize the complexity of vulnerability fixes and localities of corresponding code changes, and inspect how practitioners fix vulnerabilities in Rust packages with various localities. We find that memory safety and concurrency issues account for nearly two thirds of the vulnerabilities in the Rust ecosystem. It takes over 2 years for the vulnerabilities to become publicly disclosed, and one third of the vulnerabilities have no fixes committed before their disclosure. In terms of vulnerability density, we observe a continuous upward trend at the package level over time, but a decreasing trend at the code level since August 2020. In the vulnerable Rust packages, the vulnerable code tends to be localized at the file level, and contains statistically significantly more unsafe functions and blocks than the rest of the code. More popular packages tend to have more vulnerabilities, while the less popular packages suffer from vulnerabilities for more versions. The vulnerability fix commits tend to be localized to a limited number of lines of code. Developers tend to address vulnerable safe functions by adding safe functions or lines to them, vulnerable unsafe blocks by removing them, and vulnerable unsafe functions by modifying unsafe trait implementations. Based on our findings, we discuss implications, provide recommendations for software practitioners, and outline directions for future research.
... For a new Rust library that uses unsafe Rust code, RustBelt can tell the verification conditions that the library should meet to be considered as a safe extension to the Rust language. Baranowski et al. [6] extend the SMACK verifier [41], a software verification toolchain, to enable its usage on Rust programs. In addition, Astrauskas et al. [3] propose a verification technique that utilizes the type system of Rust to simplify the specification and verification of Rust programs. ...
Preprint
Rust is an emerging programming language designed for the development of systems software. To facilitate the reuse of Rust code, crates.io, as a central package registry of the Rust ecosystem, hosts thousands of third-party Rust packages. The openness of crates.io enables the growth of the Rust ecosystem but comes with security risks by severe security advisories. Although Rust guarantees a software program to be safe via programming language features and strict compile-time checking, the unsafe keyword in Rust allows developers to bypass compiler safety checks for certain regions of code. Prior studies empirically investigate the memory safety and concurrency bugs in the Rust ecosystem, as well as the usage of unsafe keywords in practice. Nonetheless, the literature lacks a systematic investigation of the security risks in the Rust ecosystem. In this paper, we perform a comprehensive investigation into the security risks present in the Rust ecosystem, asking ``what are the characteristics of the vulnerabilities, what are the characteristics of the vulnerable packages, and how are the vulnerabilities fixed in practice?''. To facilitate the study, we first compile a dataset of 433 vulnerabilities, 300 vulnerable code repositories, and 218 vulnerability fix commits in the Rust ecosystem, spanning over 7 years. With the dataset, we characterize the types, life spans, and evolution of the disclosed vulnerabilities. We then characterize the popularity, categorization, and vulnerability density of the vulnerable Rust packages, as well as their versions and code regions affected by the disclosed vulnerabilities. Finally, we characterize the complexity of vulnerability fixes and localities of corresponding code changes, and inspect how practitioners fix vulnerabilities in Rust packages with various localities.
... The results show that our term rewriting system solely is able to prove almost all the benchmarks. FISCHER is also considerably more efficient than the general-purpose verifiers SMACK [55], SeaHorn, CPAChecker, and Symbiotic [22], cryptography-specific verifier CryptoLine, as well as a straightforward approach that directly reduces the verification task to SMT solving. For instance, our approach is able to handle masked implementations of finite-field multiplication with masking orders up to 100 in less than 153 s, while none of the compared approaches can handle masking order of 3 in 20 min. ...
Chapter
Full-text available
Masking is a widely-used effective countermeasure against power side-channel attacks for implementing cryptographic algorithms. Surprisingly, few formal verification techniques have addressed a fundamental question, i.e., whether the masked program and the original (unmasked) cryptographic algorithm are functional equivalent. In this paper, we study this problem for masked arithmetic programs over Galois fields of characteristic 2. We propose an automated approach based on term rewriting, aided by random testing and SMT solving. The overall approach is sound, and complete under certain conditions which do meet in practice. We implement the approach as a new tool and carry out extensive experiments on various benchmarks. The results confirm the effectiveness, efficiency and scalability of our approach. Almost all the benchmarks can be proved for the first time by the term rewriting system solely. In particular, detects a new flaw in a masked implementation published in EUROCRYPT 2017.
... The results show that our term rewriting system solely is able to prove almost all the benchmarks. FISCHER is also considerably more efficient than the general-purpose verifiers SMACK [55], SeaHorn, CPAChecker, and Symbiotic [22], cryptography-specific verifier CryptoLine, as well as a straightforward approach that directly reduces the verification task to SMT solving. For instance, our approach is able to handle masked implementations of finite-field multiplication with masking orders up to 100 in less than 153 seconds, while none of the compared approaches can handle masking order of 3 in 20 minutes. ...
Preprint
Masking is a widely-used effective countermeasure against power side-channel attacks for implementing cryptographic algorithms. Surprisingly, few formal verification techniques have addressed a fundamental question, i.e., whether the masked program and the original (unmasked) cryptographic algorithm are functional equivalent. In this paper, we study this problem for masked arithmetic programs over Galois fields of characteristic 2. We propose an automated approach based on term rewriting, aided by random testing and SMT solving. The overall approach is sound, and complete under certain conditions which do meet in practice. We implement the approach as a new tool FISCHER and carry out extensive experiments on various benchmarks. The results confirm the effectiveness, efficiency and scalability of our approach. Almost all the benchmarks can be proved for the first time by the term rewriting system solely. In particular, FISCHER detects a new flaw in a masked implementation published in EUROCRYPT 2017.
... In the field of automated verification, bounded model checkers have been successful because of their ability to verify (or find bugs in) large programs with bit-level accuracy and minimal user annotation. Other successful bounded model checkers include SMACK [18], which uses the LLVM toolchain with Boogie as the solver, and ESBMC [8], which uses SMT solvers rather than a SAT solver. ...
Chapter
Full-text available
Constraint programming systems allow a diverse range of problems to be modelled and solved. Most systems require the user to learn a new constraint programming language, which presents a barrier to novice and casual users. To address this problem, we present the CoPTIC constraint programming system, which allows the user to write a model in the well-known programming language C, augmented with a simple API to support using a guess-and-check paradigm. The resulting model is at most as complex as an ordinary C program that uses naive brute force to solve the same problem. CoPTIC uses the bounded model checker CBMC to translate the model into a SAT instance, which is solved using the SAT solver CaDiCaL. We show that, while this is less efficient than a direct translation from a dedicated constraint language into SAT, performance remains adequate for casual users. CoPTIC supports constraint satisfaction and optimisation problems, as well as enumeration of multiple solutions. After a solution has been found, CoPTIC allows the model to be run with the solution; this makes it easy to debug a model, or to print the solution in any desired format.
... Along with path slicing, we also add procedure summaries for specific modules to be more detailed and context-sensitive [16] information. In the last phase, we applied path slicing and a constraint solver to enhance the precision of our approach [27]. Algorithm 1 presents the phase-wise steps of the proposed approach. ...
Article
Full-text available
Code analysis has discovered that memory leaks are common in the C programming language. In the literature, there exist various approaches for statically analyzing and detecting memory leaks. The complexity and diversity of memory leaks make it difficult to find an approach that is both effective and simple. In embedded systems, costly resources like memory become limited as the system’s size diminishes. As a result, memory must be handled effectively and efficiently too. To obtain precise analysis, we propose a novel approach that works in a phase-wise manner. Instead of examining all possible paths for finding memory leaks, we use a program slicing to check for a potential memory leak. We introduce a source-sink flow graph (SSFG) based on source-sink properties of memory allocation-deallocation within the C code. To achieve simplicity in analysis, we also reduce the complexity of analysis in linear time. In addition, we utilize a constraint solver to improve the effectiveness of our approach. To evaluate the approach, we perform manual scanning on various test cases: link list applications, Juliet test cases, and common vulnerabilities and exposures found in 2021. The results show the efficiency of the proposed approach by preparing the SSFG with linear complexity.
Article
A program’s exceptional behavior can substantially complicate its control flow, and hence accurately reasoning about the program’s correctness. On the other hand, formally verifying realistic programs is likely to involve exceptions—a ubiquitous feature in modern programming languages. In this paper, we present a novel approach to verify the exceptional behavior of Java programs, which extends our previous work on Byte Back . Byte Back works on a program’s bytecode, while providing means to specify the intended behavior at the source-code level; this approach sets Byte Back apart from most state-of-the-art verifiers that target source code. To explicitly model a program’s exceptional behavior in a way that is amenable to formal reasoning, we introduce Vimp: a high-level bytecode representation that extends the Soot framework’s Jimple with verification-oriented features, thus serving as an intermediate layer between bytecode and the Boogie intermediate verification language. Working on bytecode through this intermediate layer brings flexibility and adaptability to new language versions and variants: as our experiments demonstrate, Byte Back can verify programs involving exceptional behavior in all versions of Java, as well as in Scala and Kotlin (two other popular JVM languages).
Chapter
In this paper we introduce a novel compositional technique for inter-procedural program analysis for checking temporal properties parametrized over several objects. The method is fully automated, scales on large code, and is easily extensible. We have implemented our technique at Meta. It now constitutes one of the core engines of PrivacyCAT [41] a code analysis system developed and used at WhatsApp to detect privacy defect in code at early stage of development. PrivacyCAT runs continuously on WhatsApp server code. Our experience shows that this technique is able to detect numerous real vulnerabilities and help developers to fix them before those vulnerabilities could affect users. This paper focuses on the theoretical foundations of the technique.
Preprint
Full-text available
A program's exceptional behavior can substantially complicate its control flow, and hence accurately reasoning about the program's correctness. On the other hand, formally verifying realistic programs is likely to involve exceptions -- a ubiquitous feature in modern programming languages. In this paper, we present a novel approach to verify the exceptional behavior of Java programs, which extends our previous work on ByteBack. ByteBack works on a program's bytecode, while providing means to specify the intended behavior at the source-code level; this approach sets ByteBack apart from most state-of-the-art verifiers that target source code. To explicitly model a program's exceptional behavior in a way that is amenable to formal reasoning, we introduce Vimp: a high-level bytecode representation that extends the Soot framework's Grimp with verification-oriented features, thus serving as an intermediate layer between bytecode and the Boogie intermediate verification language. Working on bytecode through this intermediate layer brings flexibility and adaptability to new language versions and variants: as our experiments demonstrate, ByteBack can verify programs involving exceptional behavior in all versions of Java, as well as in Scala and Kotlin (two other popular JVM languages).
Article
Timing side-channel attacks exploit secret-dependent execution time to fully or partially recover secrets of cryptographic implementations, posing a severe threat to software security. Constant-time programming discipline is an effective software-based countermeasure against timing side-channel attacks, but developing constant-time implementations turns out to be challenging and error-prone. Current verification approaches/tools suffer from scalability and precision issues when applied to production software in practice. In this paper, we put forward practical verification approaches based on a novel synergy of taint analysis and safety verification of self-composed programs. Specifically, we first use an IFDS-based lightweight taint analysis to prove that a large number of potential (timing) side-channel sources do not actually leak secrets. We then resort to a precise taint analysis and a safety verification approach to determine whether the remaining potential side-channel sources can actually leak secrets. These include novel constructions of taint-directed semi-cross-product of the original program and its Boolean abstraction, and a taint-directed self-composition of the program. Our approach is implemented as a cross-platform and fully automated tool CT-Prover. The experiments confirm its efficiency and effectiveness in verifying real-world benchmarks from modern cryptographic and SSL/TLS libraries. In particular, CT-Prover identify new, confirmed vulnerabilities of open-source SSL libraries (e.g., Mbed SSL, BearSSL) and significantly outperforms the state-of-the-art tools.
Article
This paper presents a technique for repairing errors in GPU kernels written in CUDA or OpenCL due to data races and barrier divergence. Our novel extension to prior work can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called GPURepair, which uses GPUVerify as the verification oracle for GPU kernels. We also extend GPUVerify to support CUDA Cooperative Groups, allowing GPURepair to suggest inter-block synchronization for repairing a CUDA kernel if deemed necessary. To the best of our knowledge, GPURepair is the only tool that can propose a fix for intra-block data races and barrier divergence errors for both CUDA and OpenCL kernels. It is also the only tool that can propose fixes for inter-block data races in CUDA kernels. We perform extensive experiments on about 750 kernels and provide a comparison with prior work. We demonstrate the superiority of GPURepair through its capability to fix more kernels and its unique ability to remove redundant barriers and handle inter-block data races. We have also enhanced the initial version of GPURepair to support incremental solving during the repair process. This enhancement improves the performance of GPURepair by about 25% for the test suite that we have used.
Chapter
A program’s exceptional behavior can substantially complicate its control flow, and hence accurately reasoning about the program’s correctness. On the other hand, formally verifying realistic programs is likely to involve exceptions—a ubiquitous feature in modern programming languages. In this paper, we present a novel approach to verify the exceptional behavior of Java programs, which extends our previous work on ByteBack. ByteBack works on a program’s bytecode, while providing means to specify the intended behavior at the source-code level; this approach sets ByteBack apart from most state-of-the-art verifiers that target source code. To explicitly model a program’s exceptional behavior in a way that is amenable to formal reasoning, we introduce Vimp: a high-level bytecode representation that extends the Soot framework’s Grimp with verification-oriented features, thus serving as an intermediate layer between bytecode and the Boogie intermediate verification language. Working on bytecode through this intermediate layer brings flexibility and adaptability to new language versions and variants: as our experiments demonstrate, ByteBack can verify programs involving exceptional behavior in all versions of Java, as well as in Scala and Kotlin (two other popular JVM languages).
Chapter
The breakneck evolution of modern programming languages aggravates the development of deductive verification tools, which struggle to timely and fully support all new language features. To address this challenge, we present ByteBack: a verification technique that works on Java bytecode. Compared to high-level languages, intermediate representations such as bytecode offer a much more limited and stable set of features; hence, they may help decouple the verification process from changes in the source-level language. ByteBack offers a library to specify functional correctness properties at the level of the source code, so that the bytecode is only used as an intermediate representation that the end user does not need to work with. Then, ByteBack reconstructs some of the information about types and expressions that is erased during compilation into bytecode but is necessary to correctly perform verification. Our experiments with an implementation of ByteBack demonstrate that it can successfully verify bytecode compiled from different versions of Java, and including several modern language features that even state-of-the-art Java verifiers (such as KeY and OpenJML) do not directly support—thus revealing how ByteBack ’s approach can help keep up verification technology with language evolution.
Chapter
Rust is a promising system-level programming language that can prevent memory corruption bugs using its strong type system and ownership-based memory management scheme. In practice, programmers usually write Rust code in conjunction with other languages such as C/C++ through Foreign Function Interface (FFI). For example, many notable projects are developed using Rust and other programming languages, such as Firefox, Google Fuchsia OS, and the Linux kernel. Although it is widely believed that gradually re-implementing security-critical components in Rust is a way of enhancing software security, however, using FFI is inherently unsafe. In this paper, we show that memory management across the FFI boundaries is error-prone. Any incorrect use of FFI may corrupt Rust’s ownership system, leading to memory safety issues. To tackle this problem, we design and build FFIChecker, an automated static analysis and bug detection tool dedicated to memory management issues across the Rust/C FFI. We evaluate our tool by checking 987 Rust packages crawled from the official package registry and reveal 34 bugs in 12 packages. Our experiments show that FFIChecker is a useful tool to detect real-world cross-language memory management issues with a reasonable amount of computational resources.
Chapter
Full-text available
We give an overview of the development of software model checking, a general approach to algorithmic program verification that integrates static analysis, model checking, and deduction. We start with a look backwards and briefly cover some of the important steps in the past decades. The general approach has become a research topic on its own, with a wide range of tools that are based on the approach. Therefore, we discuss the maturity of the research area of software model checking in terms of looking at competitions, at citations, and most importantly, at the tools that were build in this area: we count 76 verification systems for software written in C or Java. We conclude that software model checking has quickly grown to a significant field of research with a high impact on current research directions and tools in software verification.KeywordsHistorySoftware VerificationProgrammingFormal Methods Program CorrectnessAutomatic VerificationVerification ToolsProvers
Chapter
Modularity is indispensable for scaling automatic verification to large programs. However, modularity also introduces challenges because it requires inferring and abstracting the behavior of functions as summaries – formulas that relate the function’s inputs and outputs. For programs manipulating memory, summaries must include the function’s frame, i.e., how the content memory is affected by the execution of the function. In SMT-based model-checking, memory is often modeled with (unbounded) logical arrays and expressing frames generally requires universally quantified formulas. Such formulas significantly complicate inference and subsequent reasoning and are thus to be avoided. In this paper, we present a technique to encode the memory that is bounded explicitly, eliminating the need for quantified summaries. We build on the insight that the size of frames can be statically known. This enables replacing unbounded arrays with finite maps – a finite collection of key-value pairs. Specifically, we develop a new static analysis to infer the finite parts of a function’s frame. We then extend the theory of arrays to the theory of finite maps and show that satisfiability of Constrained Horn Clauses (CHCs) over finite maps is reducible to satisfiability of CHCs over the base theory. Finally, we propose a new encoding from imperative programs to CHCs that uses finite maps to model explicitly the finite memory passed in function calls. The result is a new verification strategy that preserves the advantages of modularity while reducing the need for quantified frames. We have implemented this approach in SeaHorn, a state-of-the-art CHC-based software model checker for LLVM. An evaluation on Linux Drivers from SV-COMP shows the effectiveness of our technique.KeywordsModular verificationSoftware model checkingConstrained Horn clausesPointer analysis
Chapter
Full-text available
Bounded Model Checking (BMC) is a popularly used strategy for program verification and it has been explored extensively over the past decade. Despite such a long history, BMC still faces scalability challenges as programs continue to grow larger and more complex. One approach that has proven to be effective in verifying large programs is called Counterexample Guided Abstraction Refinement (CEGAR). In this work, we propose a complementary approach to CEGAR for bounded model checking of sequential programs: in contrast to CEGAR, our algorithm gradually widens underapproximations of a program, guided by the proofs of unsatisfiability. We implemented our ideas in a tool called Legion . We compare the performance of Legion against that of Corral , a state-of-the-art verifier from Microsoft, that utilizes the CEGAR strategy. We conduct our experiments on 727 Windows and Linux device driver benchmarks. We find that Legion is able to solve 12% more instances than Corral and that Legion exhibits a complementary behavior to that of Corral . Motivated by this, we also build a portfolio verifier, \textsc {Legion}^{+} L E G I O N + , that attempts to draw the best of Legion and Corral . Our portfolio, \textsc {Legion}^{+} L E G I O N + , solves 15% more benchmarks than Corral with similar computational resource constraints (i.e. each verifier in the portfolio is run with a time budget that is half of the time budget of Corral ). Moreover, it is found to be 2.9×2.9\times 2.9 × faster than Corral on benchmarks that are solved by both Corral and \textsc {Legion}^{+} L E G I O N + .
Article
The Internet of Things (IoT) provides convenience for our daily lives via a huge number of devices. However, due to low-resource and poor computing capability, these devices have a high number of firmware vulnerabilities. Software verification is a powerful solution to ensure the correctness and security of IoT firmware programs. Unfortunately, due to the complex semantics and syntax of program languages (typically C), applying software verification in IoT firmware faces the tradeoff between efficiency and accuracy. One of the fundamental reasons is that verification methods cannot support verifying state transitions on the memory space caused by pointer operations well. To this end, by combining sparse value flow (SVF) analysis into model checking and optimizing computational redundancy among them, we design a novel points-to-sensitive model checker, called PCHECKER, which can provide a highly precise and efficient verification for IoT firmware programs. We first design a spatial flow model to effectively describe state behaviors of a C program both on the symbolic and memory space. We then propose a counterexample-guided model checking algorithm that can dynamically refine abstract precisions and update nondeterministic points-to relations. With a set of C benchmarks containing a variety of pointer operations and other complex C features, our experiments have shown that compared with state of the art (SOTA), PCHECKER can achieve outstanding results in the verification tasks of C programs that its verification accuracy is 95.9%, and its average verification time of each line of code is 1.27 ms, which are both better than existing model checkers.
Chapter
Full-text available
In rational synthesis , we automatically construct a reactive system that satisfies its specification in all rational environments, namely environments that have objectives and act to fulfill them. We complete the study of the complexity of LTL rational synthesis. Our contribution is threefold. First, we tighten the known upper bounds for settings that were left open in earlier work. Second, our complexity analysis is parametric, and we describe tight upper and lower bounds in each of the problem parameters: the game graph, the objectives of the system components, and the objectives of the environment components. Third, we generalize the definition of rational synthesis, combining the cooperative and non-cooperative approaches studied in earlier work, and extend our complexity analysis to the general definition.
Chapter
Full-text available
We report our experience in the formal verification of the reference implementation of the Beacon Chain. The Beacon Chain is the backbone component of the new Proof-of-Stake Ethereum 2.0 network: it is in charge of tracking information about the validators , their stakes , their attestations (votes) and if some validators are found to be dishonest, to slash them (they lose some of their stakes). The Beacon Chain is mission-critical and any bug in it could compromise the whole network. The Beacon Chain reference implementation developed by the Ethereum Foundation is written in Python, and provides a detailed operational description of the state machine each Beacon Chain’s network participant (node) must implement. We have formally specified and verified the absence of runtime errors in (a large and critical part of) the Beacon Chain reference implementation using the verification-friendly language Dafny. During the course of this work, we have uncovered several issues, proposed verified fixes. We have also synthesised functional correctness specifications that enable us to provide guarantees beyond runtime errors. Our software artefact with the code and proofs in Dafny is available at https://github.com/ConsenSys/eth2.0-dafny .
Chapter
Full-text available
In a previous paper, we have shown that clause sets belonging to the Horn Bernays-Schönfinkel fragment over simple linear real arithmetic (HBS(SLR)) can be translated into HBS clause sets over a finite set of first-order constants. The translation preserves validity and satisfiability and it is still applicable if we extend our input with positive universally or existentially quantified verification conditions (conjectures). We call this translation a Datalog hammer. The combination of its implementation in SPASS-SPL with the Datalog reasoner VLog establishes an effective way of deciding verification conditions in the Horn fragment. We verify supervisor code for two examples: a lane change assistant in a car and an electronic control unit of a supercharged combustion engine. In this paper, we improve our Datalog hammer in several ways: we generalize it to mixed real-integer arithmetic and finite first-order sorts; we extend the class of acceptable inequalities beyond variable bounds and positively grounded inequalities; and we significantly reduce the size of the hammer output by a soft typing discipline. We call the result the sorted Datalog hammer. It not only allows us to handle more complex supervisor code and to model already considered supervisor code more concisely, but it also improves our performance on real world benchmark examples. Finally, we replace the before file-based interface between SPASS-SPL and VLog by a close coupling resulting in a single executable binary.
Chapter
Full-text available
Runtime verification (RV) enables monitoring systems at runtime, to detect property violations early and limit their potential consequences. This paper presents an end-to-end framework to capture requirements in structured natural language and generate monitors that capture their semantics faithfully. We leverage NASA’s Formal Requirement Elicitation Tool ( fret ), and the RV system Copilot . We extend fret with mechanisms to capture additional information needed to generate monitors, and introduce Ogma , a new tool to bridge the gap between fret and Copilot . With this framework, users can write requirements in an intuitive format and obtain real-time C monitors suitable for use in embedded systems. Our toolchain is available as open source.
Chapter
Full-text available
Logic locking “hides” the functionality of a digital circuit to protect it from counterfeiting, piracy, and malicious design modifications. The original design is transformed into a “locked” design such that the circuit reveals its correct functionality only when it is “unlocked” with a secret sequence of bits—the key bit-string. However, strong attacks, especially the SAT attack that uses a SAT solver to recover the key bit-string, have been profoundly effective at breaking the locked circuit and recovering the circuit functionality. We lift logic locking to Higher Order Logic Locking (HOLL) by hiding a higher-order relation , instead of a key of independent values, challenging the attacker to discover this key relation to recreate the circuit functionality. Our technique uses program synthesis to construct the locked design and synthesize a corresponding key relation . HOLL has low overhead and existing attacks for logic locking do not apply as the entity to be recovered is no more a value. To evaluate our proposal, we propose a new attack ( SynthAttack ) that uses an inductive synthesis algorithm guided by an operational circuit as an input-output oracle to recover the hidden functionality. SynthAttack is inspired by the SAT attack, and similar to the SAT attack, it is verifiably correct , i.e., if the correct functionality is revealed, a verification check guarantees the same. Our empirical analysis shows that SynthAttack can break HOLL for small circuits and small key relations, but it is ineffective for real-life designs.
Chapter
Full-text available
The most scalable approaches to certifying neural network robustness depend on computing sound linear lower and upper bounds for the network’s activation functions. Current approaches are limited in that the linear bounds must be handcrafted by an expert, and can be sub-optimal, especially when the network’s architecture composes operations using, for example, multiplication such as in LSTMs and the recently popular Swish activation. The dependence on an expert prevents the application of robustness certification to developments in the state-of-the-art of activation functions, and furthermore the lack of tightness guarantees may give a false sense of insecurity about a particular model. To the best of our knowledge, we are the first to consider the problem of automatically synthesizing tight linear bounds for arbitrary n-dimensional activation functions. We propose the first fully automated method that achieves tight linear bounds while only leveraging the mathematical definition of the activation function itself. Our method leverages an efficient heuristic technique to synthesize bounds that are tight and usually sound , and then verifies the soundness (and adjusts the bounds if necessary) using the highly optimized branch-and-bound SMT solver, dReal . Even though our method depends on an SMT solver, we show that the runtime is reasonable in practice, and, compared with state of the art, our method often achieves 2-5X tighter final output bounds and more than quadruple certified robustness.
Chapter
Full-text available
We introduce a similarity function on formulae of signal temporal logic (STL). It comes in the form of a kernel function , well known in machine learning as a conceptually and computationally efficient tool. The corresponding kernel trick allows us to circumvent the complicated process of feature extraction, i.e. the (typically manual) effort to identify the decisive properties of formulae so that learning can be applied. We demonstrate this consequence and its advantages on the task of predicting (quantitative) satisfaction of STL formulae on stochastic processes: Using our kernel and the kernel trick, we learn (i) computationally efficiently (ii) a practically precise predictor of satisfaction, (iii) avoiding the difficult task of finding a way to explicitly turn formulae into vectors of numbers in a sensible way. We back the high precision we have achieved in the experiments by a theoretically sound PAC guarantee, ensuring our procedure efficiently delivers a close-to-optimal predictor.
Article
Full-text available
Being able to build a map of the environment and to simultaneously localize within this map is an essential skill for mobile robots navigating in unknown environments in absence of external referencing systems such as GPS. This so-called simultaneous localization and mapping (SLAM) problem has been one of the most popular research topics in mobile robotics for the last two decades and efficient approaches for solving this task have been proposed. One intuitive way of formulating SLAM is to use a graph whose nodes correspond to the poses of the robot at different points in time and whose edges represent constraints between the poses. The latter are obtained from observations of the environment or from movement actions carried out by the robot. Once such a graph is constructed, the map can be computed by finding the spatial configuration of the nodes that is mostly consistent with the measurements modeled by the edges. In this paper, we provide an introductory description to the graph-based SLAM problem. Furthermore, we discuss a state-of-the-art solution that is based on least-squares error minimization and exploits the structure of the SLAM problems during optimization. The goal of this tutorial is to enable the reader to implement the proposed methods from scratch.
Conference Paper
Full-text available
Frama-C is a source code analysis platform that aims at conducting verification of industrial-size C programs. It provides its users with a collection of plug-ins that perform static analysis, deductive verification, and testing, for safety- and security-critical software. Collaborative verification across cooperating plug-ins is enabled by their integration on top of a shared kernel and datastructures, and their compliance to a common specification language. This foundational article presents a consolidated view of the platform, its main and composite analyses, and some of its industrial achievements.
Conference Paper
Full-text available
Consider a sequential programming language with control flow constructs such as assignments, choice, loops, and procedure calls. We restrict the syntax of expressions in this language to one that can be efficiently decided by a satisfiability-modulo-theories solver. For such a language, we define the problem of deciding whether a program can reach a particular control location as the reachability-modulo-theories problem. This paper describes the architecture of Corral, a semi-algorithm for the reachability-modulo-theories problem. It further describes the novel algorithms that comprise the various components of Corral. Finally, the paper presents evaluation of Corral against other related tools. Corral consistently outperforms its competitors on most benchmarks.
Conference Paper
Full-text available
Satisfiability Modulo Theories (SMT) problem is a decision problem for logical first order formulas with respect to combinations of background theories such as: arithmetic, bit-vectors, arrays, and uninterpreted functions. Z3 is a new and efficient SMT Solver freely available from Microsoft Research. It is used in various software verification and analysis applications.
Conference Paper
Full-text available
VCC is an industrial-strength verification environment for low-level concurrent system code written in C. VCC takes a program (annotated with function contracts, state assertions, and type invariants) and attempts to prove the correctness of these annotations. It includes tools for monitoring proof attempts and constructing partial counterex- ample executions for failed proofs. This paper motivates VCC, describes our verification methodology, describes the architecture of VCC, and reports on our experience using VCC to verify the Microsoft Hyper-V hypervisor. 5
Conference Paper
Full-text available
We provide a new characterization of scheduling nondeterminism by allowing deterministic schedulers to delay their next-scheduled task. In limiting the delays an otherwise-deterministic scheduler is allowed, we discover concurrency bugs efficiently---by exploring few schedules---and robustly---i.e., independent of the number of tasks, context switches, or buffered events. Our characterization elegantly applies to any systematic exploration (e.g., testing, model checking) of concurrent programs with dynamic task-creation. Additionally, we show that certain delaying schedulers admit efficient reductions from concurrent to sequential program analysis.
Conference Paper
Full-text available
We describe an approach to testing complex safety critical software that combines unit-level symbolic execution and system-level concrete execution for generating test cases that satisfy user-specified testing criteria. We have developed Symbolic Java PathFinder, a symbolic execution framework that implements a non-standard bytecode interpreter on top of the Java PathFinder model checking tool. The framework propagates the symbolic information via attributes associ- ated with the program data. Furthermore, we use two tech- niques that leverage system-level concrete program execu- tions to gather information about a unit's input to improve the precision of the unit-level test case generation. We applied our approach to testing a prototype NASA flight software component. Our analysis helped discover a serious bug that resulted in design changes to the software. Although we give our presentation in the context of a NASA project, we believe that our work is relevant for other critical systems that require thorough testing.
Conference Paper
Full-text available
Reasoning about heap-allocated data structures such as linked lists and arrays is challenging. The reachability predicate has proved to be useful for reasoning about the heap in type-safe languages where memory is manipulated by dereferencing object fields. Sound and precise analysis for such data structures becomes significantly more challenging in the presence of low-level pointer manipulation that is prevalent in systems software. In this paper, we give a novel formalization of the reachability predicate in the presence of internal pointers and pointer arithmetic. We have designed an annotation language for C programs that makes use of the new predicate. This language enables us to specify properties of many interesting data structures present in the Windows kernel. We present preliminary experience with a prototype verifier on a set of illustrative C benchmarks.
Conference Paper
Full-text available
This paper presents a model checking tool, SatAbs, that implements a predicate abstraction refinement loop. Existing software verification tools such as Slam, Blast ,o rMagic use decision proce- dures for abstraction and simulation that are limited to integers. Sa- tAbs overcomes these limitations by using a SAT-solver. This allows the model checker to handle the semantics of the ANSI-C standard accu- rately. This includes a sound treatment of bit-vector overflow, and of the ANSI-C pointer arithmetic constructs.
Conference Paper
Full-text available
Context-sensitive pointer analysis algorithms with full "heapcloning" are powerful but are widely considered to be too expensive to include in production compilers. This paper shows, for the first time, that a context-sensitive, field-sensitive algorithm with fullheap cloning (by acyclic call paths) can indeed be both scalable and extremely fast in practice. Overall, the algorithm is able to analyze programs in the range of 100K-200K lines of C code in 1-3 seconds,takes less than 5% of the time it takes for GCC to compile the code (which includes no whole-program analysis), and scales well across five orders of magnitude of code size. It is also able to analyze the Linux kernel (about 355K linesof code) in 3.1 seconds. The paper describes the major algorithmic and engineering design choices that are required to achieve these results, including (a) using flow-insensitive and unification-basedanalysis, which are essential to avoid exponential behavior in practice;(b) sacrificing context-sensitivity within strongly connected components of the call graph; and (c) carefully eliminating several kinds of O(N2) behaviors (largely without affecting precision). The techniques used for (b) and (c) eliminated several major bottlenecks to scalability, and both are generalizable to other context-sensitive algorithms. We show that the engineering choices collectively reduce analysis time by factors of up to 10x-15xin our larger programs, and have found that the savings grow strongly with program size. Finally, we briefly summarize results demonstrating the precision of the analysis.
Article
Full-text available
Propositional bounded model checking has been applied successfully to verify embedded software but is limited by the increasing propositional formula size and the loss of structure during the translation. These limitations can be reduced by encoding word-level information in theories richer than propositional logic and using SMT solvers for the generated verification conditions. Here, we investigate the application of different SMT solvers to the verification of embedded software written in ANSI-C. We have extended the encodings from previous SMT-based bounded model checkers to provide more accurate support for finite variables, bit-vector operations, arrays, structures, unions and pointers. We have integrated the CVC3, Boolector, and Z3 solvers with the CBMC front-end and evaluated them using both standard software model checking benchmarks and typical embedded applications from telecommunications, control systems and medical devices. The experiments show that our approach can analyze larger problems and substantially reduce the verification time. Comment: 12 pages
Conference Paper
Full-text available
We describe LLVM (low level virtual machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in static single assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently. The LLVM compiler framework and code representation together provide a combination of key capabilities that are important for practical, lifelong analysis and transformation of programs. To our knowledge, no existing compilation approach provides all these capabilities. We describe the design of the LLVM representation and compiler framework, and evaluate the design in three ways: (a) the size and effectiveness of the representation, including the type information it provides; (b) compiler performance for several interprocedural problems; and (c) illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Conference Paper
Cascade is a program static analysis tool developed at New York University. Cascade takes as input a program and a control file. The control file specifies one or more assertions to be checked together with restrictions on program behaviors. The tool generates verification conditions for the specified assertions and checks them using an SMT solver which either produces a proof or gives a concrete trace showing how an assertion can fail. Version 2.0 supports the majority of standard C features except for floating point. It can be used to verify both memory safety as well as user-defined assertions. In this paper, we describe the Cascade system including some of its distinguishing features such as its support for different memory models (trading off precision for scalability) and its ability to reason about linked data structures.
Article
We present a tool for the formal verification of ANSI-C programs using Bounded Model Checking (BMC). The emphasis is on usability: the tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types. From the perspective of the user, the verification is highly automated: the only input required is the BMC bound. The tool is integrated into a graphical user interface. This is essential for presenting long counterexample traces: the tool allows stepping through the trace in the same way a debugger allows stepping through a program.
Conference Paper
Static verification traditionally produces yes/no answers. It either provides a proof that a piece of code meets a property, or a counterexample showing that the property can be violated. Hence, the progress of static verification is hard to measure. Unlike in testing, where coverage metrics can be used to track progress, static verification does not provide any intermediate result until the proof of correctness can be computed. This is in particular problematic because of the inevitable incompleteness of static verifiers. To overcome this, we propose a gradual verification approach, GraVy. For a given piece of Java code, GraVy partitions the statements into those that are unreachable, or from which exceptional termination is impossible, inevitable, or possible. Further analysis can then focus on the latter case. That is, even though some statements still may terminate exceptionally, GraVy still computes a partial result. This allows us to measure the progress of static verification.We present an implementation of GraVy and evaluate it on several open source projects.
Article
This paper presents a model checking tool, SATABS, that implements a predicate abstraction refinement loop. Existing software verification tools such as SLAM, BLAST, or MAGIC use decision procedures for abstraction and simulation that are limited to integers. SATABS overcomes these limitations by using a SAT-solver. This allows the model checker to handle the semantics of the ANSI-C standard accurately. This includes a sound treatment of bit-vector overflow, and of the ANSI-C pointer arithmetic constructs.
Conference Paper
We investigate the algorithmic feasibility of checking whether concurrent implementations of shared-memory objects adhere to their given sequential specifications; sequential consistency, linearizability, and conflict serializability are the canonical variations of this problem. While verifying sequential consistency of systems with unbounded concurrency is known to be undecidable, we demonstrate that conflict serializability, and linearizability with fixed linearization points are EXPSPACE-complete, while the general linearizability problem is undecidable. Our (un)decidability proofs, besides bestowing novel theoretical results, also reveal novel program explorations strategies. For instance, we show that every violation to conflict serializability is captured by a conflict cycle whose length is bounded independently from the number of concurrent operations. This suggests an incomplete detection algorithm which only remembers a small subset of conflict edges, which can be made complete by increasing the number of remembered edges to the cycle-length bound. Similarly, our undecidability proof for linearizability suggests an incomplete detection algorithm which limits the number of “barriers” bisecting non-overlapping operations. Our decidability proof of bounded-barrier linearizability is interesting on its own, as it reduces the consideration of all possible operation serializations to numerical constraint solving. The literature seems to confirm that most violations are detectable by considering very few conflict edges or barriers.
Conference Paper
Recently, software verification is being used to prove the presence of contradictions in source code, and thus detect potential weaknesses in the code or provide assistance to the compiler optimization. Compared to verification of correctness properties, the translation from source code to logic can be very simple and thus easy to solve by automated theorem provers. In this paper, we present a translation of Java into logic that is suitable for proving the presence of contradictions in code. We show that the translation, which is based on the Jimple language, can be used to analyze real-world programs, and discuss some issues that arise from differences between Java code and its bytecode.
Conference Paper
In this paper, we present Ufo, a framework and a tool for verifying (and finding bugs in) sequential C programs. The framework is built on top of the LLVM compiler infrastructure and is targeted at researchers designing and experimenting with verification algorithms. It allows definition of different abstract post operators, refinement strategies and exploration strategies. We have built three instantiations of the framework: a predicate abstraction-based version, an interpolation-based version, and a combined version which uses a novel and powerful combination of interpolation-based and predicate abstraction-based algorithms.
Article
This note defines BoogiePL, an intermediate language for program analysis and program verification. The language is a simple coarsely typed imperative language with procedures and arrays, plus support for introducing mathematical functions and declaring properties of these functions. BoogiePL can be used to represent programs written in an imperative source language (like an object-oriented .NET language), along with a logical encoding of the semantics of such a source language. From the resulting BoogiePL program, one can then generate verification conditions or perform other program analyses such as the inference of program invariants. In this way, BoogiePL also serves as a programming-notation front end to theorem provers. BoogiePL is accepted as input to Boogie, the Spec# static program verifier.
Conference Paper
Because of its critical importance underlying all other software, low-level system software is among the most important targets for formal verification. Low-level systems software must sometimes make type-unsafe memory accesses, but because of the vast size of available heap memory in today’s computer systems, faithfully representing each memory allocation and access does not scale when analyzing large programs. Instead, verification tools rely on abstract memory models to represent the program heap. This paper reports on two related investigations to develop an accurate (i.e., providing a useful level of soundness and precision) and scalable memory model: First, we compare a recently introduced memory model, specifically designed to more accurately model low-level memory accesses in systems code, to an older, widely adopted memory model. Unfortunately, we find that the newer memory model scales poorly compared to the earlier, less accurate model. Next, we investigate how to improve the soundness of the less accurate model. A direct approach is to add assertions to the code that each memory access does not break the assumptions of the memory model, but this causes verification complexity to blow-up. Instead, we develop a novel, extremely lightweight static analysis that quickly and conservatively guarantees that most memory accesses safely respect the assumptions of the memory model, thereby eliminating almost all of these extra type-checking assertions. Furthermore, this analysis allows us to create automatically memory models that flexibly use the more scalable memory model for most of memory, but resorting to a more accurate model for memory accesses that might need it.
Conference Paper
Automatically detecting bugs in programs has been a long-held goal in software engineering. Many techniques exist, trading-off varying levels of automation, thoroughness of coverage of program behavior, precision of analysis, and scalability to large code bases. This paper presents the CALYSTO static checker, which achieves an unprecedented combination of precision and scalability in a com- pletely automatic extended static checker. CALYSTO is interpro- cedurally path-sensitive, fully context-sensitive, and bit-accurate in modeling data operations — comparable coverage and precision to very expensive formal analyses — yet scales comparably to the leading, less precise, static-analysis-based tool for similar proper- ties. Using CALYSTO, we have discovered dozens of bugs, com- pletely automatically, in hundreds of thousands of lines of produc- tion, open-source applications, with a very low rate of false error reports. This paper presents the design decisions, algorithms, and optimizations behind CALYSTO's performance.
Conference Paper
Program verification systems typically transform a program into a logical expression which is then fed to a theorem prover. The logical expression represents the weakest precondition of the program relative to its specification; when (and if!) the theorem prover is able to prove the expression, then the program is considered correct. Computing such a logical expression for an imperative, structured program is straightforward, although there are issues having to do with loops and the efficiency both of the computation and of the complexity of the formula with respect to the theorem prover. This paper presents a novel approach for computing the weakest precondition of an unstructured program that is sound even in the presence of loops. The computation is efficient and the resulting logical expression provides more leeway for the theorem prover efficiently to attack the proof.
Conference Paper
Bounded model checking (BMC) of C and C++ programs is challenging due to the complex and intricate syntax and semantics of these programming languages. The BMC tool LLBMC presented in this paper thus uses the LLVM compiler framework in order to translate C and C++ programs into LLVM's intermediate representation. The resulting code is then converted into a logical representation and simplified using rewrite rules. The simplified formula is finally passed to an SMT solver. In contrast to many other tools, LLBMC uses a flat, bit-precise memory model. It can thus precisely model, e.g., memory-based re-interpret casts as used in C and static/dynamic casts as used in C++. An empirical evaluation shows that LLBMC compares favorable to the related BMC tools CBMC and ESBMC.
Conference Paper
Many approaches to software verification are currently semi-automatic: a human must provide key logical insights — e.g., loop invariants, class invariants, and frame axioms that limit the scope of changes that must be analyzed. This paper describes a technique for automatically in- ferring frame axioms of procedures and loops using static analysis. The technique builds on a pointer analysis that generates limited information about all data struc- tures in the heap. Our technique uses that information to over-approximate a potentially unbounded set of mem- ory locations modified by each procedure/loop; this over- approximation is a candidate frame axiom. We have tested this approach on the buffer-overflow benchmarks from ASE 2007. With manually provided speci- fications and invariants/axioms, our tool could verify/falsify 226 of the 289 benchmarks. With our automatically inferred frame axioms, the tool could verify/falsify 203 of the 289, demonstrating the effectiveness of our approach.
Conference Paper
We present a tool for the formal verification of ANSI-C programs using Bounded Model Checking (BMC). The emphasis is on usability: the tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types. From the perspective of the user, the verification is highly automated: the only input required is the BMC bound. The tool is integrated into a graphical user interface. This is essential for presenting long counterexample traces: the tool allows stepping through the trace in the same way a debugger allows stepping through a program.
Conference Paper
We present a new symbolic execution tool, KLEE, ca- pable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions of Unix sys- tems, and arguably are the single most heavily tested set of open-source programs in existence. KLEE-generated tests achieve high line coverage — on average over 90% per tool (median: over 94%) — and significantly beat the coverage of the developers' own hand-written test suite. When we did the same for 75 equivalent tools in the BUSYBOX embedded system suite, results were even better, including 100% coverage on 31 of them. We also used KLEE as a bug finding tool, applying it to 452 applications (over 430K total lines of code), where it found 56 serious bugs, including three in C OREUTILS that had been missed for over 15 years. Finally, we used KLEE to crosscheck purportedly identical B USYBOX and COREUTILS utilities, finding functional correctness er- rors and a myriad of inconsistencies.
Conference Paper
Our goal is the verification of C programs at the source code level using formal proof tools. Programs are specified using annotations such as pre- and post-conditions and global invariants. An original approach is presented which allows to formally prove that a function implementation satisfies its specification and is free of null pointer dereferencing and out-of-bounds array access. The method is not bound to a particular back-end theorem prover. A significant part of the ANSI C language is supported, including pointer arithmetic and possible pointer aliasing. We describe a prototype tool and give some experimental results.
Article
The Toolkit for Accurate Scientific Software (TASS) is a suite of integrated tools for the formal verification of programs used in computational science, including numerically-intensive message-passing-based parallel programs. While TASS can verify a number of standard safety properties (such as absence of deadlocks and out-of-bound array indexing), its most powerful feature is the ability to establish that two programs are functionally equivalent. These properties are verified by performing an explicit state enumeration of a model of the program(s). In this model, symbolic expressions are used to represent the inputs and the values of variables. TASS uses novel techniques to simplify the symbolic representation of the state and to reduce the number of states explored and saved. The TASS front-end supports a large subset of C, including (multi-dimensional) arrays, structs, dynamically allocated data, pointers and pointer arithmetic, functions and recursion, and other commonly used language constructs. A number of experiments on small but realistic numerical programs show that TASS can scale to reasonably large configurations and process counts. TASS is open source software distributed under the GNU Public License. The Java source code, examples, experimental results, and reference materials are all available at http://vsl.cis.udel.edu/tass.
Conference Paper
Configurable software verification is a recent concept for expressing different program analysis and model checking approaches in one single formalism. This paper presents CPAchecker, a tool and framework that aims at easy integration of new verification components. Every abstract domain, together with the corresponding operations, is required to implement the interface of configurable program analysis (CPA). The main algorithm is configurable to perform a reachability analysis on arbitrary combinations of existing CPAs. The major design goal during the development was to provide a framework for developers that is flexible and easy to extend. We hope that researchers find it convenient and productive to implement new verification ideas and algorithms using this platform and that it advances the field by making it easier to perform practical experiments. The tool is implemented in Java and runs as command-line tool or as Eclipse plug-in. We evaluate the efficiency of our tool on benchmarks from the software model checker BLAST. The first released version of CPAchecker implements CPAs for predicate abstraction, octagon, and explicit-value domains. Binaries and the source code of CPAchecker are publicly available as free software.
Multi-prover verification of C programs
  • J.-C Filliâtre
  • C Marché
Cseq: A sequentialization tool for C (competition contribution)
  • B Fischer
  • O Inverso
  • G Parlato