Article

Complexity Results for May-Happen-in-Parallel Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

For concurrent and parallel languages, may-happen-in-parallel (MHP) analysis is useful as a basis for tools such as data race detectors. While many approximate static MHP analyses exist, researchers have published only a few papers on decidability results for MHP analysis. We study MHP analysis for a model of X10, a parallel language with async-finish parallelism. For programs with procedures, we show that the MHP decision problem is decidable in linear time, and hence the set of pairs of actions that may happen in parallel can be computed in cubic time. For programs without procedures, we present a practical recur-sive decision procedure that does multiple-query MHP analysis in cubic time. Our results indicate that MHP analysis is tractable for a standard storeless abstraction of X10 programs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... May-Happen-in-Parallel. There exist may-happen-in-parallel (MHP) analyses for a wide variety of concurrency models (see, e.g., Reference [25] for C Programs, Reference [2] for X10 programs, Reference [18] for Java programs, and References [13,38] for the ABS language). We use the existing MHP analysis for ABS in References [13,38] to approximate the tasks that could start their execution when the processor is released. ...
... There exist may-happen-in-parallel (MHP) analyses for a wide variety of concurrency models (see, e.g., Reference [25] for C Programs, Reference [2] for X10 programs, Reference [18] for Java programs, and References [13,38] for the ABS language). We use the existing MHP analysis for ABS in References [13,38] to approximate the tasks that could start their execution when the processor is released. This analysis infers pairs of program points whose execution might happen in parallel, or in an interleaved way, according to the following definition from Reference [14]. ...
Article
This article presents parallel cost analysis, a static cost analysis targeting to over-approximate the cost of parallel execution in distributed systems. In contrast to the standard notion of serial cost, parallel cost captures the cost of synchronized tasks executing in parallel by exploiting the true concurrency available in the execution model of distributed processing. True concurrency is challenging for static cost analysis, because the parallelism between tasks needs to be soundly inferred, and the waiting and idle processor times at the different locations need to be accounted for. Parallel cost analysis works in three phases: (1) it performs a block-level analysis to estimate the serial costs of the blocks between synchronization points in the program; (2) it then constructs a distributed flow graph (DFG) to capture the parallelism, the waiting, and idle times at the locations of the distributed system; and (3) the parallel cost can finally be obtained as the path of maximal cost in the DFG. We prove the correctness of the proposed parallel cost analysis, and provide a prototype implementation to perform an experimental evaluation of the accuracy and feasibility of the proposed analysis.
... The problem of inferring E P is clearly undecidable in our setting [9], and thus we develop a MHP analysis which statically approximates E P . The analysis is done in two main steps, first it infers method-level MHP information. ...
Chapter
This paper presents a may-happen-in-parallel (MHP) analysis for OO languages based on concurrent objects. In this concurrency model, objects are the concurrency units such that, when a method is invoked on an object o 2 from a task executing on object o 1, statements of the current task in o 1 may run in parallel with those of the (asynchronous) call on o 2, and with those of transitively invoked methods. The goal of the MHP analysis is to identify pairs of statements in the program that may run in parallel in any execution. Our MHP analysis is formalized as a method-level (local) analysis whose information can be modularly composed to obtain application-level (global) information.
... MHP is undecidable in general [18]. Under certain restrictions, MHP analysis for the async-finish parallelism without dynamic barriers has been shown to be in cubic time [13]. MHP analysis has been studied in [9,15,16] for Ada, in [4,14,17] for Java and explored in [1,12,20] for X10. ...
Conference Paper
Full-text available
May-happen-in-parallel analysis is a very important analysis which enables several optimizations in parallel programs. Most of the work on MHP analysis has used forward flow analysis to compute "parallel(n)" | set of nodes which may execute in parallel to a given node "n" | including those approaches that addressed the issue for dynamic barrier languages. We propose a new approach to MHP analysis called "Phase Interval Analysis" (PIA) which computes phase intervals, corresponding to dynamic barriers, in which a statement may execute. PIA enables us to infer an order between two statements whenever it can establish that they can not execute in parallel. Because the ordering relation is transitive, we may also be able to infer indirect synchronization happening between two statements, even when they do not share a barrier. To the best of our knowledge, the issue of indirect synchronization has not been addressed prior to this work. We also compute condition functions under which different instances of the same statement may not execute in parallel, when the statements are nested within loops.
Conference Paper
We present a novel static analysis to infer the peak cost of distributed systems. The different locations of a distributed system communicate and coordinate their actions by posting tasks among them. Thus, the amount of work that each location has to perform can greatly vary along the execution depending on: (1) the amount of tasks posted to its queue, (2) their respective costs, and (3) the fact that they may be posted in parallel and thus be pending to execute simultaneously. The peak cost of a distributed location refers to the maximum cost that it needs to carry out along its execution. Inferring the peak cost is challenging because it increases and decreases along the execution, unlike the standard notion of total cost which is cumulative. Our key contribution is the novel notion of quantified queue configuration which captures the worst-case cost of the tasks that may be simultaneously pending to execute at each location along the execution. A prototype implementation demonstrates the accuracy and feasibility of the proposed peak cost analysis.
Article
Full-text available
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many false positive datarace reports, or maintained precision but incurred significant overheads in the range of 3x to 30x. In contrast, our approach results in very few false positives and runtime overhead in the 13% to 42% range, making it both efficient and precise. This performance improvement is the result of a unique combination of complementary static and dynamic optimization techniques.
Conference Paper
Full-text available
Information about which pairs of statements in a concurrent program can execute in parallel is important for optimizing and debugging programs, for detecting anomalies, and for improving the accuracy of data flow analysis. In this paper, we describe a new data flow algorithm that finds a conservative approximation of the set of all such pairs. We have carried out an initial comparison of the precision of our algorithm and that of the most precise of the earlier approaches, Masticola and Ryder's non-concurrency analysis [8], using a sample of 159 concurrent Ada programs that includes the collection assembled by Masticola and Ryder. For these examples, our algorithm was almost always more precise than non-concurrency analysis, in the sense that the set of pairs identified by our algorithm as possibly happening in parallel is a proper subset of the set identified by non-concurrency analysis. In 132 cases, we were able to use reachability analysis to determine exactly the set of pairs of statements that may happen in parallel. For these cases, there were a total of only 10 pairs identified by our algorithm that cannot actually happen in parallel.
Conference Paper
Full-text available
Modeling of runtime threads in static analysis of concurrent programs plays an important role in both reducing the complexity and improving the precision of the analysis. Modeling based on type based techniques merges all runtime instances of a particular type and thereby introduces inaccuracy in the analysis. Other approaches model individual runtime threads explicitly in the analysis and are of high complexity. In this paper we introduce a thread model that is both context and flow sensitive. Individual thread abstractions are identified based on the context and multiplicity of the creation site. The interaction among these abstract threads are depicted in a tree structure known as Thread Creation Tree (TCT). The TCT structure is subsequently exploited to efficiently compute May-Happen-in-Parallel (MHP) information for the analysis of multi-threaded programs. For concurrent Java programs, our MHP computation algorithm runs 1.77x (on an average) faster than previously reported MHP computation algorithm.
Conference Paper
Full-text available
It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes.We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places}; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java™ languages.
Conference Paper
Full-text available
We introduce two abstract models for multithreaded programs based on dynamic networks of pushdown systems. We address the problem of symbolic reachability analysis for these models. More precisely, we consider the problem of computing effective representations of their reachability sets using finite-state automata. We show that, while forward reachability sets are not regular in gen- eral, backward reachability sets starting from regular sets of configurations are always regular. We provide algorithms for computing backward reachability sets using word/tree automata, and show how these algorithms can be applied for flow analysis of multithreaded programs.
Article
Full-text available
. Information about which statements in a concurrent program may happen in parallel (MHP) has a number of important applications. It can be used in program optimization, debugging, program understanding tools, improving the accuracy of data flow approaches, and detecting synchronization anomalies, such as data races. In this paper we propose a data flow algorithm for computing a conservative estimate of the MHP information for Java programs that has a worstcase time bound that is cubic in the size of the program. We present a preliminary experimental comparison between our algorithm and a reachability analysis algorithm that determines the "ideal" static MHP information for concurrent Java programs. This initial experiment indicates that our data flow algorithm precisely computed the ideal MHP information in the vast majority of cases we examined. In the two out of 29 cases where the MHP algorithm turned out to be less than ideally precise, the number of spurious pairs was small compar...
Conference Paper
In this paper we present an implementation of May Happen in Parallel analysis for Java that attempts to address some of the practical implementation concerns of the original work. We describe a design that incorporates techniques for aiding a feasible implementation and expanding the range of acceptable inputs. We provide experimental results showing the utility and impact of our approach and optimizations using a variety of concurrent benchmarks.
Conference Paper
We present a core calculus with two of X10's key constructs for parallelism, namely async and finish. Our calculus forms a convenient basis for type systems and static analyses for languages with async-finish parallelism. For example, we present a type system for context-sensitive may-happen-in-parallel analysis, along with experimental results of performing type inference on 13,000 lines of X10 code. Our analysis runs in polynomial time and produces a low number of false positives, which suggests that our analysis is a good basis for other analyses such as race detectors. Our calculus is also a good basis for tractable proofs of correctness. For example, we give a one-page proof of the deadlock-freedom theorem of Saraswat and Jagadeesan, and we prove the correctness of our type system.
Conference Paper
Non-concurrency analysis is a set of techniques for statically identifying pairs (or sets) of statements in a concurrent program which can never happen together. This information aids programmers in debugging and manually optimizing programs, improves the precision of data flow analysis, enables optimized translation of rendezvous, facilitates dead code elimination and other automatic optimizations, and allows anomaly detection in explicitly parallel programs. We present a framework for non-concurrency analysis, capable of incorporating previous analysis algorithms [CS88, DS92] and improving upon them. We show general theoretical results which are useful in estimating non-concurrency, and examples of non-concurrency analysis frameworks for two synchronization primitives: the Ada rendezvous and binary semaphores. Both of these frameworks have a low-order polynomial bound on worst-case solution time. We provide experimental evidence that static non-concurrency analysis of Ada programs can be accomplished in a reasonable time, and is generally quite accurate. Our framework, and the set of refinement components we develop, also exhibits dramatic accuracy improvements over [DS91], when the latter is used as a stand-alone algorithm, as demonstrated by our experiments.
Conference Paper
The problem of Pairwise CFL-reachability is to decide whether two given program locations in different threads are simultaneously reachable in the presence of recursion in threads and scheduling constraints imposed by synchronization primitives. Pairwise CFL-reachability is the core problem underlying concurrent program analysis especially dataflow analysis. Unfortunately, it is undecidable even for the most commonly used synchronization primitive, i.e., mutex locks. Lock usage in concurrent programs can be characterized in terms of lock chains, where a sequence of mutex locks is said to be chained if the scopes of adjacent (nonnested mutexes overlap. Although pairwise reachability is known to decidable for threads interacting via nested locks i.e., chains of length one, these techniques don't extend to programs with non-nested locks used in crucial applications like databases and device drivers. In this paper, we exploit the fact that lock usage patterns in real life programs do not produce unbounded lock chains. For such programs, we show that pairwise CFL-reachability becomes decidable. Our new results narrow the decidability gap for pairwise CFL-reachability by providing a more refined characterization for it in terms of boundedness of lock chains rather than the current state-of-the-art, i.e., nestedness of locks (chains of length one).
Conference Paper
Race detection algorithms for multi-threaded programs using the common lock-based synchronization idiom must correlate locks with the memory locations they guard. The heart of a proof of race freedom is showing that if two locks are distinct, then t he memory locations they guard are also distinct. This is an example of a general property we call conditional must not aliasing: Under the assumption that two objects are not aliased, prove that t wo other objects are not aliased. This paper introduces and gives an algorithm for conditional must not alias analysis and discusses experimental results for sound race detection of Java programs.
Conference Paper
Although the data-flow framework is a powerful tool to statically analyze a program, current data-flow analysis techniques have not addressed the effect of procedures on concurrency analysis. This work develops a data race detection technique using a data-flow framework that analyzes concurrent events in a program in which tasks and procedures interact. There are no restrictions placed on the interactions between procedures and tasks, and thus recursion is permitted. Solving a system of data-flow equations, the technique computes a partial execution order for regions in the program by considering the control flow within a program unit, communication between tasks, and the cdlinglcreation context of procedures and tasks. From the computed execution order, con- current events are determined as unordered events. We show how the information about concurrent events can be used in debugging to automatically detect data races.
Article
Foundational to verification of some aspects of communicating concurrent systems is knowledge of the synchronization which may occur during execution. The synchronization determines the actions that may occur in parallel, may determine program data flow, and may also lead to inherently erroneous situations (e.g. deadlock). This paper formalizes the notion of the synchronization structure of concurrent programs that use the rendezvous (or similar) mechanism for achieving synchronization. The formalism is oriented towards supporting verification as performed by automated static program analysis. Complexity results are presented which indicate what may be expected in this area and which also shed light on the difficulty of correctly constructing concurrent systems. Specifically, most of the analysis tasks considered are shown to be intractable.
Article
This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center [Publications 16-220 ykt] P. O. Box 218, Yorktown Heights, NY 10598. email: reports@us.ibm.com