[Show abstract][Hide abstract] ABSTRACT: Speedup in distributed executions of constraint logic programming (CLP) applications are directed related to a good constraint partitioning algorithm. In this work we study different mechanisms to distribute constraints to processors based on straightforward mechanisms such as round-robin and block distribution, and on a more sophisticated automatic distribution method, grouping-sink, that takes into account the connectivity of the constraint network graph. This aims at reducing the communication overhead in distributed environments. Our results show that grouping-sink is, in general, the best alternative for partitioning constraints as it produces results as good or better than round-robin or blocks with low communication rate.
[Show abstract][Hide abstract] ABSTRACT: This work presents the parallelisation of the AC-5 arc-consistency algorithm for two dierent parallel architectures. One is a cluster of PCs and the other is a centralised memory machine (CMM). We conducted our experiments using an adapted version of the PCSOS parallel constraint solving system, over nite domains. The implementation for the cluster of PCs uses Treadmarks (TMK), a software distributed shared memory (SDSM) system. On the CMM we use synchronisation based on atomic readmodify -write primitives supported in hardware. We ran four benchmarks used by the original PCSOS to assess the performance of the system. We implemented dierent kinds of partitioning for the constraints, and dierent kinds of distributed labeling that are not present in the original version. Our results show that arc-consistency algorithms have very good speedups on centralised memory systems, and have a great potential for parallelisation on low cost distributed-shared memory platforms. We showed that performance of the benchmarks are greatly aected by these kinds of partitioning and distributed labeling. One of our applications achieves superlinear speedups due to distributed labeling. Speedups for the cluster of PCs are limited by the write invalidate cache coherence protocol used by TMK and extra synchronisation required by its memory consistency model, size of the problem, and kind of distribution of indexicals and labeling. Speedups for the CMM are better than for the cluster, however this platform is more expensive than a cluster.