Conference Paper

Characterizing parallel workloads to reduce multiple writer overhead in shared virtual memory systems

Departamento de Informatica de Sistemas y Computadores, Univ. Politecnica de Valencia
DOI: 10.1109/EMPDP.2002.994285 Conference: Parallel, Distributed and Network-based Processing, 2002. Proceedings. 10th Euromicro Workshop on
Source: IEEE Xplore

ABSTRACT Shared virtual memory (SVM) systems, because of their software
implementation, enable shared-memory programming at a low design and
maintenance cost. Nevertheless, as hardware implementations become
faster, their performance is still far from that achieved by distributed
shared memory (DSM) systems. Nowadays, SVM systems use relaxed memory
consistency models and multiple writer protocols as techniques to reduce
latencies and false sharing, respectively. However, these techniques
induce additional overhead that decreases system performance. We
performed a study of workload behavior aimed at improving the design of
SVM protocols. The work focused on the identification of the type of
shared data patterns that can appear in the accesses to protected
sections using semaphores. Most coherence actions in SVM systems are
performed as a consequence of the write operations executed in critical
sections, so we pay special attention to the write operations performed
when multiple writers are allowed. As these write operations may present
spatial locality, we also study the write patterns on shared pages with
similar behaviour. Different software filters are applied in the
instrumented parallel workloads selected to capture and classify the
most common sharing patterns. This enables the recognition of those
patterns in which coherence overhead can be reduced by modifying the
coherence actions performed by the protocol. Despite the fact that the
performance evaluation of new coherence solutions is not our main goal,
the ideas presented to improve the behaviour of SVM systems can be
implemented at a reasonable hardware/software cost

0 Bookmarks
 · 
135 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The data vortex photonic interconnection network is studied for application to clustering and hierarchical layering of nodes. Performance is examined for varying cluster counts and under loads of varying network locality. In today's technology, similar performance is attained at high network communication locality loads 2/3, and a 19% latency reduction is obtained at the highest locality loads 95% for current optical switching technology. For projected future technology, the clustered system is shown to yield up to a 55% reduc-tion in latency for applications with 2/3 or better locality. © 2007 Optical So-ciety of America OCIS codes: 060.0060, 060.2310, 060.4250, 200.4650.
    Journal of Optical Networking 08/2007; 6(9). · 1.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The definition of the data vortex architecture leaves broad room for decisions regarding the exact design point required for achieving a desired performance level. A detailed simulation-based study of various parameters that affect a data vortex interconnection network's performance is reported. Three implementations are compared by acceptance rate, latency, and cost.
    Journal of Optical Networking 04/2007; 6(4):369-374. · 1.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Today's supercomputers employ the fastest processors incorporating the latest VLSI technology. Unfortunately, usable system performance is often limited by excessive interprocessor latency. To overcome this bottleneck, this thesis explores the use of all-optical path interconnection networks using a new topology defined by Coke Reed [31]. This work overcomes limitations of previous optical networks through a novel use of defection routing to minimize latency and allow more processors to collaborate on the same application and dataset. In this thesis research, the data vortex is formally characterized and tested for performance. Extra angles serve as "virtual buffers" to provide required system performance, even under asymmetric mode operation. The data vortex is compared to two well-known interconnection networks (omega and butterfly) using metrics of average latency and message acceptance rate. The data vortex is shown to outperform the comparison networks, with a 20-50% higher acceptance rate and comparable average latency. The impact of angle size is also studied, and a new, synchronous mode of operation is proposed where additional angles are added to increase the virtual buffering of the network. The tradeoff between virtual buffering and angle resolution backpressure is explored, and an optimal point is found at the 1:6 I/O to non-I/O (virtual buffering) angle ratio. The new mode and optimal angle count are used to form data vortex networks that perform as well as larger networks with fewer total nodes. Finally, hierarchical layering with data vortex clusters is proposed and compared to a single-level data vortex. In today's technology, similar performance is attained at high network communication locality loads (> 2/3), and a 19% latency reduction is obtained at the highest locality loads (> 95%) for current optical switching technology. For projected future technology, the clustered system is shown to yield up to a 55% reduction in latency for applications with 2/3 or better locality. Dr. Henry L. Owen III, Committee Member ; Dr. David Keezer, Committee Member ; Dr. D. Scott Wills, Committee Chair. Thesis (Ph. D.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2007.

Full-text (4 Sources)

Download
68 Downloads
Available from
May 30, 2014