Conference Paper

A low latency, loss tolerant architecture and protocol for widearea group communication

Dept. of Comput. Sci., Johns Hopkins Univ., Baltimore, MD
DOI: 10.1109/ICDSN.2000.857557 Conference: Dependable Systems and Networks, 2000. DSN 2000. Proceedings International Conference on
Source: IEEE Xplore


Group communication systems are proven tools upon which to build fault-tolerant systems. As the demands for fault-tolerance increase and more applications require reliable distributed computing over wide area networks, wide area group communication systems are becoming very useful. However, building a wide area group communication system is a challenge. This paper presents the design of the transport protocols of the spread wide area group communication system. We focus on two aspects of the system. First, the value of using overlay networks for application level group communication services. Second, the requirements and design of effective low latency link protocols used to construct wide area group communication. We support our claims with the results of live experiments conducted over the Internet

Full-text preview

Available from:
  • Source
    • "A GCS usually offers an atomic (i.e., total order) multicast message delivery service which enables an application to send messages to a set of destinations such that they are delivered in the same order by each destination. Group communication and total order topics have been studied for more than two decades from both a theoretical [4] [7] and a practical [3] [8] [14] [1] point of view. A useful additional guarantee a GCS may offer is priority-based delivery [19] [17] [15], which allows a user application to prioritize the sending and delivery of certain messages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A prioritized atomic multicast protocol allows an application to tag messages with a priority that expresses their urgency and tries to deliver first those with a higher priority. For instance, such a service can be used in a database replication context, to reduce the transaction abort rate when integrity constraints are used. We present a study of the three most important and well-known classes of atomic multicast protocols in which we evaluate the cost imposed by the prioritization mechanisms, in terms of additional latency overhead, computational cost and memory use. This study reveals that the behavior of the protocols depends on the particular properties of the setting (number of nodes, message sending rates, etc.) and that the extra work done by a prioritized protocol does not introduce any additional latency overhead in most of the evaluated settings. This study is also a performance comparison of these classes of total order protocols and can be used by system designers to choose the proper prioritized protocol for a given deployment.
    Full-text · Conference Paper · Nov 2009
  • Source
    • "In order to address our research question, we study a replicated distributed system. The system runs over a token-ring group communication protocol, called Spread [2], to maintain message ordering and reliability. The token-ring approach of Spread greatly constrains the network flow at any given point in time. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Communication infrastructures that provide distributed systems with key services can also end up being the medium whereby faults propagate through the system. We have previously observed that a single faulty node can degrade the performance of other, non-faulty nodes in the system. We present a method for identifying the node that is the origin of the failure by examining the architecture-driven constrained network-flows in a distributed system. By identifying the effects of the failure on the network, combined with our knowledge of the network-flow constraints, we can trace the effects of the failure back to its source node. We empirically evaluate our methods on a data set that was generated by injecting multiple performance-faults into a replicated middleware system with an underlying token-ring based group communication protocol. We correctly identify the faulty node in the case of failures that significantly change the performance characteristics of the network.
    Preview · Article · Jan 2007
  • Source
    • "MADIS has been implemented in Java, with a JDBC interface to applications. Its current communication support is provided by the Spread group communication system [1] [4]. Since, in this paper, nothing beyond the manner in which the detection of conflicting concurrent transactions is supported by the middleware architecture, no additional details about the group communication system nor about MADIS are relevant . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Database replication protocols need to detect, block or abort part of conflicting transactions. A possible solution is to check their writesets (and also their readsets in case a serialisable isolation level is requested), which however burdens the consumption of CPU time. This gets even worse when the replication support is provided by a middleware, since there is no direct DBMS support in that layer. We propose and discuss the use of the concurrency control support of the local DBMS for detecting conflicts between local transactions and writesets of remote transactions. This allows to simplify many database replication protocols and to enhance their performance
    Full-text · Conference Paper · Oct 2006
Show more