Conference PaperPDF Available

Optimal Dynamic Partial Order Reduction

Authors:

Abstract

Stateless model checking is a powerful technique for program verification, which however suffers from an exponential growth in the number of explored executions. A successful technique for reducing this number, while still maintaining complete coverage, is Dynamic Partial Order Reduction (DPOR). We present a new DPOR algorithm, which is the first to be provably optimal in that it always explores the minimal number of executions. It is based on a novel class of sets, called source sets, which replace the role of persistent sets in previous algorithms. First, we show how to modify an existing DPOR algorithm to work with source sets, resulting in an efficient and simple to implement algorithm. Second, we extend this algorithm with a novel mechanism, called wakeup trees, that allows to achieve optimality. We have implemented both algorithms in a stateless model checking tool for Erlang programs. Experiments show that source sets significantly increase the performance and that wakeup trees incur only a small overhead in both time and space.
A preview of the PDF is not available
... In the last decade, new developments proposed optimal solutions (see e.g. [1,2,24]). In the same time, those techniques have been tuned for relaxed memory models where even local actions may be independent (see e.g. ...
... In this paper, we propose adaptations of the ODPOR algorithm [1]. We want to improve the practical efficiency of ODPOR when searching for bugs. ...
... Optimal Dynamic Partial Order Reduction (ODPOR) [1] is a recent DPOR technique. Like other DPOR algorithms, ODPOR is based on a stateless depthfirst exploration of a reduced state space. ...
Preprint
Assessing the correctness of distributed and parallel applications is notoriously difficult due to the complexity of the concurrent behaviors and the difficulty to reproduce bugs. In this context, Dynamic Partial Order Reduction (DPOR) techniques have proved successful in exploiting concurrency to verify applications without exploring all their behaviors. However, they may lack of efficiency when tracking non-systematic bugs of real size applications. In this paper, we suggest two adaptations of the Optimal Dynamic Partial Order Reduction (ODPOR) algorithm with a particular focus on bug finding and explanation. The first adaptation is an out-of-order version called RFS ODPOR which avoids being stuck in uninteresting large parts of the state space. Once a bug is found, the second adaptation takes advantage of ODPOR principles to efficiently find the origins of the bug.
... Dynamic partial-order reduction (DPOR) [10,1,19] is a mature approach to mitigate the state explosion problem in stateless model checking of multithreaded programs. DPORs are based on Mazurkiewicz trace theory [13], a true-concurrency semantics where the set of executions of the program is partitioned into equivalence classes known as Mazurkiewicz traces (M-traces). ...
... In a DPOR, this partitioning is defined by an independence relation over concurrent actions that is computed dynamically and the method explores executions which are representatives of M-traces. The exploration is sound when it explores all M-traces, and it is considered optimal [1] when it explores each M-trace only once. ...
... Since two independent actions might have to be explored from the same state in order to explore all M-traces, a DPOR algorithm uses independence to compute a provably-sufficient subset of the enabled transitions to explore for each state encountered. Typically this involves the combination of forward reasoning (persistent sets [11] or source sets [1,4]) with backward reasoning (sleep This paper is the extended version of a paper with the same title appeared at the proceedings of CAV '18. 4 Shortly after this extended version was made public, we were made aware of the recent publication of another paper [3] which contains an independently-discovered example program with the same characteristics. sets [11]) to obtain a more efficient exploration. ...
Preprint
A dynamic partial order reduction (DPOR) algorithm is optimal when it always explores at most one representative per Mazurkiewicz trace. Existing literature suggests that the reduction obtained by the non-optimal, state-of-the-art Source-DPOR (SDPOR) algorithm is comparable to optimal DPOR. We show the first program with O(n)\mathop{\mathcal{O}}(n) Mazurkiewicz traces where SDPOR explores O(2n)\mathop{\mathcal{O}}(2^n) redundant schedules and identify the cause of the blow-up as an NP-hard problem. Our main contribution is a new approach, called Quasi-Optimal POR, that can arbitrarily approximate an optimal exploration using a provided constant k. We present an implementation of our method in a new tool called Dpu using specialised data structures. Experiments with Dpu, including Debian packages, show that optimality is achieved with low values of k, outperforming state-of-the-art tools.
... SMC must implement strategies to reduce the number of executions (event orderings) that are evaluated, as the number of possible thread schedulings increases exponentially with program length and the number of threads [14]. One of the most notable methods is partial order reduction [31][32][33], modified as dynamic partial order reduction (DPOR) for SMC [34]. Two executions can be considered equivalent if they result in the same ordering of conflicting statement executions (referred to as events). ...
... In this study, the performance of the three proposed variants was compared with the third algorithm from [14], referred to as "ConsistencyDecision", by analyzing a collection of parametric concurrent programs containing a variable number of threads. These programs, sourced from SCTBench [42], sv-comp [43], and lastzero [34], were written in the C programming language. This program collection encompasses examples with both low and high numbers of Mazurkiewicz traces, depending on the trace count of each program. ...
Article
Full-text available
Verifying sequential consistency (SC) in concurrent programs is computationally challenging due to the exponential growth of possible interleavings among read and write operations. Many of these interleavings produce identical outcomes, rendering exhaustive verification approaches inefficient and computationally expensive, especially as thread counts increase. To mitigate this challenge, this study introduces a novel approach that efficiently verifies SC by identifying a minimal subset of valid event orderings. The proposed method iteratively focuses on ordering write events and evaluates their compatibility with SC conditions, including program order, read-from (rf) relations, and SC semantics, thereby significantly reducing redundant computations. Corresponding read events are subsequently integrated according to program order once the validity of write events has been confirmed, enabling rapid identification of violations to SC criteria. Three algorithmic variants of this approach were developed and empirically evaluated. The final variant exhibited superior performance, achieving substantial improvements in execution time—ranging from 31.919% to 99.992%—compared to the optimal existing practical SC verification algorithms. Additionally, comparative experiments demonstrated that the proposed approach consistently outperforms other state-of-the-art methods in both efficiency and scalability.
... Addressing the control state explosion resulting from concurrency poses a signi cant challenge to concurrent program veri cation. Several techniques have recently been studied to overcome this problem, including stateless model checking [1,4,9,24], compositional reasoning [14,22,29,32], bounded model checking [3,10,20,25,38,43], and abstraction re nement [11,12,21,29,35], etc. e general idea of stateless model checking is to employ partial order reduction (POR) or dynamic partial order reduction (DPOR) [1,4,9,41] to explore only non-redundant interleavings. ...
... Addressing the control state explosion resulting from concurrency poses a signi cant challenge to concurrent program veri cation. Several techniques have recently been studied to overcome this problem, including stateless model checking [1,4,9,24], compositional reasoning [14,22,29,32], bounded model checking [3,10,20,25,38,43], and abstraction re nement [11,12,21,29,35], etc. e general idea of stateless model checking is to employ partial order reduction (POR) or dynamic partial order reduction (DPOR) [1,4,9,41] to explore only non-redundant interleavings. ...
Preprint
Bounded model checking is among the most efficient techniques for the automatic verification of concurrent programs. However, encoding all possible interleavings often requires a huge and complex formula, which significantly limits the salability. This paper proposes a novel and efficient abstraction refinement method for multi-threaded program verification. Observing that the huge formula is usually dominated by the exact encoding of the scheduling constraint, this paper proposes a \tsc based abstraction refinement method, which avoids the huge and complex encoding of BMC. In addition, to obtain an effective refinement, we have devised two graph-based algorithms over event order graph for counterexample validation and refinement generation, which can always obtain a small yet effective refinement constraint. Enhanced by two constraint-based algorithms for counterexample validation and refinement generation, we have proved that our method is sound and complete w.r.t. the given loop unwinding depth. Experimental results on \svcompc benchmarks indicate that our method is promising and significantly outperforms the existing state-of-the-art tools.
... Runtime predictive analysis has been extended to other properties such as deadlocks, atomicity violations as well as more general properties [5,6,42,61], but is known to be intractable in general [22,32,32,40] and HB-based race detection, based on Mazurkiewicz-style trace-based reasoning has remained popular because of the performance benefits it offers. Concurrency testing approaches, on the other hand, aim to explore bugs by executing the underlying program systematically multiple times using randomization [13,65,67], together with feedback-guidance [63] or in a strictly enumerative manner [2][3][4]31]. ...
Preprint
Full-text available
Dynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal -- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.
... Dynamic partial order reduction (DPOR) techniques have been successfully applied to model check concurrent programs across different languages and con-texts [1,2,10]. For Rust specifically, several tools target concurrency bug detection, though with different approaches and tradeoffs compared to ours. ...
Preprint
RustMC is a stateless model checker that enables verification of concurrent Rust programs. As both Rust and C/C++ compile to LLVM IR, RustMC builds on GenMC which provides a verification framework for LLVM IR. This enables the automatic verification of Rust code and any C/C++ dependencies. This tool paper presents the key challenges we addressed to extend GenMC. These challenges arise from Rust's unique compilation strategy and include intercepting threading operations, handling memory intrinsics and uninitialized accesses. Through case studies adapted from real-world programs, we demonstrate RustMC's effectiveness at finding concurrency bugs stemming from unsafe Rust code, FFI calls to C/C++, and incorrect use of atomic operations.
... In the future, we also plan to consider concurrency with an appropriate technique for handling the state explosion problem. For instance, we may employ a partial order reduction technique [1] to obtain the minimal set of concurrent behaviours for a given program, and then generate the associated executions using our interpreter. ...
Preprint
Dynamically typed languages, like Erlang, allow developers to quickly write programs without explicitly providing any type information on expressions or function definitions. However, this feature makes those languages less reliable than statically typed languages, where many runtime errors can be detected at compile time. In this paper, we present a preliminary work on a tool that, by using the well-known techniques of metaprogramming and symbolic execution, can be used to perform bounded verification of Erlang programs. In particular, by using Constraint Logic Programming, we develop an interpreter that, given an Erlang program and a symbolic input for that program, returns answer constraints that represent sets of concrete data for which the Erlang program generates a runtime error.
... Recently, SMC for C/C++ has made significant advances [1,2,[15][16][17]21,23,24]. It was largely due to the development of axiomatic memory models [6,8,20] and dynamic partial order reduction [3,4,10]. GenMC [17,19], the state-of-theart algorithm for SMC and its implementation, performs optimal state-space exploration by using a graph-based compact representation of program traces. ...
Chapter
Full-text available
Stateless model checking (SMC) is crucial for productivity in verified concurrent programming, and its recent developments for C/C++ and weak memory models are remarkable. The state-of-the-art SMC for C, GenMC, efficiently verifies C programs based on C11 atomics and pthreads. However, it does not support mixed-size accesses, accesses to the same memory region with different-sized types, even though they are ubiquitous in C/C++, particularly the code for memory management. As a result, GenMC does not work for C/C++ programs containing memory management. To resolve this problem, we develop a method of adapting GenMC to mixed-size accesses preserving its optimality. We experimentally evaluate the efficiency of our extended implementation of GenMC and its efficacy for memory management programs.
Article
State-of-the-art model checkers employing dynamic partial order reduction (DPOR) can verify concurrent programs under a wide range of memory models such as sequential consistency (SC), total store order (TSO), release-acquire (RA), and the repaired C11 memory model (RC11) in an optimal and memory-efficient fashion. Unfortunately, these DPOR techniques cannot be applied in an optimal fashion to programs with mixed-sized accesses (MSA), where atomic instructions access different (sets of) bytes belonging to the same word. Such patterns naturally arise in real life code with C/C++ union types, and are even used in a concurrent setting. In this paper, we introduce Mixer, an optimal DPOR algorithm for MSA programs that allows (multi-byte) reads to be revisited by multiple writes together. We have implemented Mixer in the GenMC model checker, enabling (for the first time) the automatic verification of C/C++ code with mixed-size accesses. Our results also extend to the more general case of transactional programs provided that the set of read accesses performed by a transaction can be dynamically overapproximated at the beginning of the transaction.
Preprint
Full-text available
Atomicity violation is one of the most serious types of bugs in concurrent programs. Synchronizations are commonly used to enforce atomicity. However, it is very challenging to place synchronizations correctly and sufficiently due to complex thread interactions and large input space. This paper presents \textsf{VeriFix}, a new approach for verifying atomicity violation fixes. Given a buggy trace that exposes an atomicity violation and a corresponding fix, % in the form of locks, \textsf{VeriFix} effectively verifies if the fix introduces sufficient synchronizations to repair the atomicity violation without introducing new deadlocks. The key idea is that \textsf{VeriFix} transforms the fix verification problem into a property verification problem, in which both the observed atomicity violation and potential deadlocks are encoded as a safety property, and both the inputs and schedules are encoded as symbolic constraints. By reasoning the conjoined constraints with an SMT solver, \textsf{VeriFix} systematically explores all reachable paths %from the whole schedule and input space and verifies if there exists a concrete \textit{schedule+input} combination to manifest the intended atomicity or any new deadlocks. We have implemented and evaluated \verifix\ on a collection of real-world C/C++ programs. The result shows that \textsf{VeriFix} significantly outperforms the state-of-the-art.
Conference Paper
Full-text available
To detect hard-to-find concurrency bugs, testing tools try to systematically explore all possible interleavings of the transitions in a concurrent program. Unfortunately, because of the nondeterminism in concurrent programs, exhaustively exploring all interleavings is time-consuming and often computationally intractable. Speeding up such tools requires pruning the state space explored. Partial-order reduction (POR) techniques can substantially prune the number of explored interleavings. These techniques require defining a dependency relation on transitions in the program, and exploit independency among certain transitions to prune the state space. We observe that actor systems, a prevalent class of programs where computation entities communicate by exchanging messages, exhibit a dependency relation among co-enabled transitions with an interesting property: transitivity. This paper introduces a novel dynamic POR technique, TransDPOR, that exploits the transitivity of the dependency relation in actor systems. Empirical results show that leveraging transitivity speeds up exploration by up to two orders of magnitude compared to existing POR techniques.
Data
Full-text available
In multithreaded programs both environment input data and the nondeterministic interleavings of concurrent events can affect the behavior of the program. One approach to systematically explore the nondeterminism caused by input data is dynamic symbolic execution. For testing multithreaded programs we present a new approach that combinesdynamicsymbolicexecutionwithunfoldings, amethod originally developed for Petri nets but also applied to many other models of concurrency. We provide an experimental comparison of our new approach with existing algorithms combining dynamic symbolic execution and partial-order reductions and show that the new algorithm can explore the reachable control states of each thread with a significantly smaller number of test runs. In some cases the reduction to the number of test runs can be even exponential allowing programs with long test executions or hard-to-solve constrains generated by symbolic execution to be tested more efficiently. Categories andSubject Descriptors
Conference Paper
Full-text available
We present the techniques used in Concuerror, a systematic testing tool able to find and reproduce a wide class of concurrency errors in Erlang programs. We describe how we take advantage of the characteristics of Erlang's actor model of concurrency to selectively instrument the program under test and how we subsequently employ a stateless search strategy to systematically explore the state space of process interleaving sequences triggered by unit tests. To ameliorate the problem of combinatorial explosion, we propose a novel technique for avoiding process blocks and describe how we can effectively combine it with preemption bounding, a heuristic algorithm for reducing the number of explored interleaving sequences. We also briefly discuss issues related to soundness, completeness and effectiveness of techniques used by Concuerror.
Conference Paper
Full-text available
Testing multi-threaded programs is hard due to the state explosion problem arising from the different interleavings of concurrent operations. The dynamic partial order reduction (DPOR) algorithm by Flanagan and Godefroid is one solution to reducing this problem. We present a modification to this algorithm that allows it to exploit the commutativity of read operations and provide further reduction. To enable testing of multi-threaded programs that also take input we show that it is possible to combine DPOR with concolic testing. We have implemented our modified DPOR algorithm in the LCT concolic testing tool. We have also implemented the sleep set algorithm, which can be used along with DPOR to provide further reduction. As the LCT tool was designed for distributed use we have modified the sleep set algorithm for use in a distributed testing client-server setting.
Article
Full-text available
In multithreaded programs both environment input data and the nondeterministic interleavings of concurrent events can affect the behavior of the program. One approach to systematically explore the nondeterminism caused by input data is dynamic symbolic execution. For testing multithreaded programs we present a new approach that combines dynamic symbolic execution with unfoldings, a method originally developed for Petri nets but also applied to many other models of concurrency. We provide an experimental comparison of our new approach with existing algorithms combining dynamic symbolic execution and partial-order reductions and show that the new algorithm can explore the reachable control states of each thread with a significantly smaller number of test runs. In some cases the reduction to the number of test runs can be even exponential allowing programs with long test executions or hard-to-solve constrains generated by symbolic execution to be tested more efficiently.
Article
Full-text available
State-space caching is a verification technique for finite-state concurrent systems. It performs an exhaustive exploration of the state space of the system being checked while storing only all states of just one execution sequence plus as many other previously visited states as available memory allows. So far, this technique has been of little practical significance: it allows one to reduce memory usage by only twoo to three times, before an unacceptable blow-up of the run-time overhead sets in. The explosion of the run-time requirements is due to redundant multiple explorations of unstored parts of the state space. Indeed, almost all states in the state space of concurrent systems are typically reached several times during the search. In this paper, we present a method to tackle the main cause of this prohibitive state matching: the exploration of all possible interleavings of concurrent executions of the system which all lead to the same state. Then, we show that, in many cases, with this method, most reachable states are visited only once during state-space exploration. This enables one not to store most of the states that have already been visited without incurring too much redundant explorations of parts of the state space, and makes therefore state-space caching a much more attractive verification method. As an example, we were able to competely explore a state space of 250,000 states while storing simultaneously no more than 500 states and with only a three-fold increas of the run-time requirements.
Article
Full-text available
With the advancement of computer technology, highly concurrent systems are being developed. The verification of such systems is a challenging task, as their state space grows exponentially with the number of processes. Partial order reduction is an effective technique to address this problem. It relies on the observation that the effect of executing transitions concurrently is often independent of their ordering. In this paper we present the basic principles behind partial order reduction and its implementation.
Article
We present a new approach to partial-order reduction for model checking software. This approach is based on initially exploring an arbitrary interleaving of the various concurrent processes/threads, and dynamically tracking interactions between these to identify backtracking points where alternative paths in the state space need to be explored. We present examples of multi-threaded programs where our new dynamic partial-order reduction technique significantly reduces the search space, even though traditional partial-order algorithms are helpless.
Article
Testing concurrent programs that accept data inputs is no- toriously hard because, besides the large number of possi- ble data inputs, nondeterminism results in an exponentially large number of interleavings of concurrent events. We pro- pose a novel testing algorithm for concurrent programs in which our goal is not only to execute all reachable state- ments of a program, but to detect all possible data races, and deadlock states. The algorithm uses a combination of symbolic and concrete execution (called concolic execution) to explore all distinct causal structures (or partial order re- lations among events generated during execution) of a con- current program. The idea of concolic testing is to use the symbolic execution to generate inputs that direct a program to alternate paths, and to use the concrete execution to guide the symbolic execution along a concrete path. Our algorithm uses the concrete execution to compute the exact race conditions between the events of an execution at run- time. Subsequently, we systematically re-order or permute the events involved in these races by generating new thread schedules as well as generate new test inputs. This way we explore at least one representative from each partial order. We describe jCUTE, a tool implementing the testing algo- rithm together with the results of applying jCUTE to real- world multithreaded Java applications and libraries. In one of our case studies, we discovered several undocumented po- tential concurrency-related bugs in the widely used Java col- lection framework distributed with the Sun Microsystems' JDK 1.4.