Conference Paper

A low latency, loss tolerant architecture and protocol for widearea group communication

Dept. of Comput. Sci., Johns Hopkins Univ., Baltimore, MD
DOI: 10.1109/ICDSN.2000.857557 Conference: Dependable Systems and Networks, 2000. DSN 2000. Proceedings International Conference on
Source: IEEE Xplore

ABSTRACT Group communication systems are proven tools upon which to build fault-tolerant systems. As the demands for fault-tolerance increase and more applications require reliable distributed computing over wide area networks, wide area group communication systems are becoming very useful. However, building a wide area group communication system is a challenge. This paper presents the design of the transport protocols of the spread wide area group communication system. We focus on two aspects of the system. First, the value of using overlay networks for application level group communication services. Second, the requirements and design of effective low latency link protocols used to construct wide area group communication. We support our claims with the results of live experiments conducted over the Internet

  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, much of the discussion involving “smart grids” has implicitly involved only the distribution side, notably advanced metering. However, today's electric systems have many challenges that also involve the rest of the system. An enabling technology for improving the power system, which has emerged in recent years, is the ability to measure coherent, real-time data. In this paper, we describe major challenges facing electrical generation and transmission today that availability of these measurements can help address. We overview applications using coherent, real-time measurements that are in use today or proposed by researchers. Specifically, we describe, normalize, and then quantitatively compare key factors for these power applications that influence how the delivery system should be planned, implemented, and managed. These factors include whether a person or computer is in the loop and (for both inputs and outputs) latency, rate, criticality, quantity, and geographic scope. From this, we abstract the baseline communications requirements of a data delivery system supporting these applications and suggest implementation guidelines to achieve them. Finally, we overview the state of the art in the supporting computer science areas of overlay networking and distributed computing (including middleware) and analyze gaps in commercial middleware products, utility standards, and issues that limit low-level network protocols from meeting these requirements when used in isolation.
    Proceedings of the IEEE 06/2011; 99(6):928-951. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic systems become more and more widespread in many application fields. This observation highlights the trend that systems are no more considered as running in a static, predefined environment. Evolution has to be taken into account and fault tolerance does not make an exception. When the executive or environmental context has changed, hypothesis or fault model may become outdated or invalid. On-line adaptation of fault tolerance has to be tackled. This paper deals with the design of fault tolerance for its on-line adaptation. It describes a reflective architecture, suitable for modifying fault tolerance at runtime. Then it shows how fault tolerance may be componentized into small components to enable its runtime modification.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Information Management (IM) services provide a powerful capability for military operations, enabling managed information exchange based on the characteristics of the information that is needed and the information that is available, rather than on explicit knowledge of the information consumers, producers, and repositories. To be usable in tactical environments and mission critical operations, IM services need to be resilient to faults and failures, which can be due to many factors, including design or implementation flaws, misconfiguration, corruption, hardware or infrastructure failure, resource intermittency or contention, or hostile actions. This paper presents a reference model for representing the performance and fault tolerance requirements of IM services in tactical operations. A Joint Close Air Support operation is described using this representation and the viability of canonical fault tolerance techniques are examined for a given deployment.