Conference Paper

The "Forgetful" Weak Data Consistency Protocol

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

One major issue in parallel computer design is providing a familar programming model with minimum overhead. In this context the Distributed Shared Memory (DSM) model has been the focus of substantial research. In a DSM computer the physically distributed memory modules appear to the programmer as a single address space. To achieve high performance DSM memory protocols use information derived from program synchronisation operations to perform consistency activities only when absolutely necessary. This means that the programmer can assume that sequential consistency exists when writing code, but the underlying hardware works out when such consistency is actually needed. A survey of recent publications identifies three major classes of protocol; Weakly Consistent, Release Consistent, and Processor Consistent (WC,RC,PC). Optimisation using write pipelining has been proposed to reduce update delays. Lazy invalidation policies have been investigated. Attempts have also been made to use knowledge about how data objects are manipulated to customize the manner in which consistency is enforced. These approaches assume that conveying written data to memory and carrying out any associated consistency activity are an indivisible unit. Forgetful consistency discards this connection and extends the WC class of protocol using a low cost cache hardware modification to perform selective cache flushing on each node. This purges shared data from the processor caches at consistency events (barriers, mutual exclusion), eliminating the need for invalidation messages and the associated hardware. The evaluations reported here show that Forgetful consistency performs very well in comparison to a standard write pipelined weak consistency protocol. Eliminating invalidation simplifies hardware designs and also leads to an overall decrease in network utilisation by each application. These factors combine to yield improvements in execution speed of up to 30% for some applications on large machines.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.