Cluster Computing

Published by Springer Nature

Online ISSN: 1573-7543


Print ISSN: 1386-7857


Condor-G: A computation management agent for multi-institutional Grids
  • Conference Paper

February 2001


124 Reads


Todd Tannenbaum





In recent years, there has been a dramatic increase in the amount of available computing and storage resources, yet few have been able to exploit these resources in an aggregated form. We present the Condor-G system, which leverages software from Globus and Condor to allow users to harness multi-domain resources as if they all belong to one personal domain. We describe the structure of Condor-G and how it handles job management, resource selection, security and fault tolerance

Risk-Aware Limited Lookahead Control for Dynamic Resource Provisioning in Enterprise Computing Systems

July 2006


59 Reads

Utility or on-demand computing, a provisioning model where a service provider makes computing infrastructure available to customers as needed, is becoming increasingly common in enterprise computing systems. Realizing this model requires making dynamic and sometimes risky, resource provisioning and allocation decisions in an uncertain operating environment to maximize revenue while reducing operating cost. This paper develops an optimization framework wherein the resource provisioning problem is posed as one of sequential decision making under uncertainty and solved using a limited lookahead control scheme. The proposed approach accounts for the switching costs incurred during resource provisioning and explicitly encodes risk in the optimization problem. Simulations using workload traces from the Soccer World Cup 1998 web site show that a computing system managed by our controller generates up to 20% more revenue than a system without dynamic control while incurring low control overhead.

Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors

February 1999


26 Reads

In this paper, we consider the problem of scheduling and mapping precedence-constrained tasks to a network of heterogeneous processors. In such systems, processors are usually physically distributed, implying that the communication cost is considerably higher than in tightly coupled multiprocessors. Therefore, scheduling and mapping algorithms for such systems must schedule the tasks as well as the communication traffic by treating both the processors and communication links as important resources. We propose an algorithm that achieves these objectives and adapts its tasks scheduling and mapping decisions according to the given network topology. Just like tasks, messages are also scheduled and mapped to suitable links during the minimization of the finish times of tasks. Heterogeneity of processors is exploited by scheduling critical tasks to the fastest processors. Our extensive experimental study has demonstrated that the proposed algorithm is efficient, robust, and yields consistent performance over a wide range of scheduling parameters

Design and implementation of a pluggable fault tolerant CORBA infrastructure

February 2002


45 Reads

In this paper we present the design and implementation of a Pluggable Fault Tolerant CORBA Infrastructure that provides fault tolerance for CORBA applications by utilizing the pluggable protocols framework that is available for most CORBA ORBS. Our approach does not require modification to the CORBA ORB, and requires only minimal modifications to the application. Moreover; it avoids the difficulty of retrieving and assigning the ORB state, by incorporating the fault tolerance mechanisms into the ORB. The Pluggable Fault Tolerant CORBA Infrastructure achieves performance that is similar to, or better than, that of other Fault Tolerant CORBA systems, while providing strong replica consistency

Internal node and shortcut based routing with guaranteed deliveryin wireless networks
  • Conference Paper
  • Full-text available

May 2001


22 Reads

Several distributed routing algorithms for wireless networks were described recently, based on location information of nodes available via Global Positioning System (GPS). In a greedy routing algorithm, the sender or node S currently holding the message m forwards m to one of its neighbors that is the closest to the destination. The algorithm fails if S does not have any neighbor that is closer to destination than S. The FACE algorithm guarantees the delivery of m if the network, modeled by unit graph, is connected. The GFG algorithm combines greedy and FACE algorithms. We further improve the performance of the GFG algorithm, by reducing its average hop count. First we improve the FACE algorithm by adding a sooner-back procedure for earlier escape from FACE mode. Then we perform a shortcut procedure at each forwarding node S. Node S uses the local information available to calculate as many hops as possible and forwards the packet to the last known hop directly instead of forwarding it to the next hop. The second improvement is based on the concept of dominating sets. The network of internal nodes defines a connected dominating set, and each node must be either internal or directly connected to an internal node. We apply several existing definitions of internal nodes, namely the concepts of intermediate, inter-gateway and gateway nodes. We propose to run GFG routing, enhanced by shortcut procedure, on the dominating set, except possibly the first and last hops. We obtained localized routing algorithm that guarantees delivery and has very low excess in terms of hop count compared to the shortest path algorithm. Experimental data show that the length of additional path can be reduced to about half that of existing GFG algorithm

HSK: A Hierarchical Parallel Simulation Kernel for Multicore Platform

June 2011


23 Reads

The development of CPU has stepped into the era of multi-core. Due to lack of support on thread level, most of the simulation platform can not take full advantage of multicore. To fulfill this gap, we proposed a hierarchical parallel simulation kernel(HSK) model. The model has two layers. The first layer, named process kernel, was responsible for managing all thread kernels on second layer. The second layer is a group of thread kernels, which were responsible for scheduling and advancing logical processes. Each thread kernel was mapped onto an executing thread to advance simulation parallel. In addition, two algorithms were proposed to support high performance: (1) To improve the communication efficiency between threads, we proposed a pointer-based communication mechanism. By using buffers, synchronization between threads can be annihilated. (2) To eliminate redundant Lower Bound on Time Stamp(LBTS) computation and not to interrupt thread execution, we employ an approximate method to compute LBTS asynchronously. A proof of validity was presented. The execution performance of HSK was demonstrated by a series of simulation experiments with a modified phold model. The HSK can achieve good speedup for applications, especially with coarse-grained event.

Topological Characteristics of Random Multihop Wireless Networks

June 2003


14 Reads

Multihop wireless networks are treated as random symmetric planar point graphs, where all the nodes have the same transmission power and radius, and vertices of a graph are drawn randomly over certain geographical region. Several basic and important topological properties of random multihop wireless networks are studied, including node degree, connectivity, diameter bisection width, and biconnectivity. It is believed that such study has very useful implication in real applications.

Fig. 1 The 1000 Islands solution architecture consisting of node, pod, and pod set controllers
Fig. 2 Node controller architecture
Table 2 Comparison of migration events and unsatisfied demand with and without integration
Fig. 3 Simulation environment setup
Fig. 4 CPU quality vs. rearrangement periods for pod set controller only (with perfect knowledge of future demands)


1000 Islands: An integrated approach to resource management for virtualized datacenters

March 2009


376 Reads

Recent advances in hardware and software virtualization offer unprecedented management capabilities for the mapping of virtual resources to physical resources. It is highly desirable to further create a “service hosting abstraction” that allows application owners to focus on service level objectives (SLOs) for their applications. This calls for a resource management solution that achieves the SLOs for many applications in response to changing data center conditions and hides the complexity from both application owners and data center operators. In this paper, we describe an automated capacity and workload management system that integrates multiple resource controllers at three different scopes and time scales. Simulation and experimental results confirm that such an integrated solution ensures efficient and effective use of data center resources while reducing service level violations for high priority applications.

IEEE 802.11 Wireless LAN: Saturation Throughput Analysis with Seizing Effect Consideration

April 2002


179 Reads

The IEEE 802.11 network technology is the emerging standard for wireless LANs and mobile networking. The fundamental access mechanism in the IEEE 802.11 MAC protocol is the Distributed Coordination Function. In this paper, we present an analytical method of estimating the saturation throughput of 802.11 wireless LAN in the assumption of ideal channel conditions. The proposed method generalizes the existing 802.11 LAN models and advances them in order to take the Seizing Effect into consideration. This real-life effect consists in the following: the station that has just completed successfully its transmission has a better chance of winning in the competition and therefore of seizing the channel than other LAN stations. The saturation throughput of 802.11 wireless LANs is investigated by the developed method. The obtained numerical results are validated by simulation and lead to the change of the existing idea of the optimal access strategy in the saturation conditions.

Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions

September 2010


173 Reads

Both distributed systems and multicore systems are difficult programming environments. Although the expert programmer may be able to carefully tune these systems to achieve high performance, the non-expert may struggle. We argue that high level abstractions are an effective way of making parallel computing accessible to the non-expert. An abstraction is a regularly structured framework into which a user may plug in simple sequential programs to create very large parallel programs. By virtue of a regular structure and declarative specification, abstractions may be materialized on distributed, multicore, and distributed multicore systems with robust performance across a wide range of problem sizes. In previous work, we presented the All-Pairs abstraction for computing on distributed systems of single CPUs. In this paper, we extend All-Pairs to multicore systems, and introduce the Wavefront and Makeflow abstractions, which represent a number of problems in economics and bioinformatics. We demonstrate good scaling of both abstractions up to 32cores on one machine and hundreds of cores in a distributed system. KeywordsAbstractions-Multicore-Distributed systems-Bioinformatics-Economics

Figure 1: Accelerated Web Proxy Architecture.  
Figure 2: Interactions i n a W eb Proxy (WP) cluster with several Proxy Accelerators (PA).
Figure 3: Hint representation based on Bloom Filters.  
Figure 4: Throughput with numberof WP and PA nodes. (no service from PA cache)  
Figure 15: Variation of hit rate with replacement bound. '-1' for unbounded replacements. Bloom Filter, 6 GByte WP, 4 MByte hint space.
Web Proxy Acceleration

January 2001


1,084 Reads

Numerous studies show that miss ratios at forward proxies are typically at least 40–50%. This paper proposes and evaluates a new approach for improving the throughput of Web proxy systems by reducing the overhead of handling cache misses. Namely, we propose to front-end a Web proxy with a high performance node that filters the requests, processing the misses and forwarding the hits and the new cacheable content to the proxy. Requests are filtered based on hints of the proxy cache content. This system, called Proxy Accelerator, achieves significantly better communications performance than a traditional proxy system. For instance, an accelerator can be built as an embedded system optimized for communication and HTTP processing, or as a kernel-mode HTTP server. Scalability with the Web proxy cluster size is achieved by using several accelerators. We use analytical models, trace-based simulations, and a real implementation to study the benefits and the implementation tradeoffs of this new approach. Our results show that a single proxy accelerator node in front of a 4-node Web proxy can improve the cost-performance ratio by about 40%. Hint-based request filter implementation choices that do not affect the overall hit ratio are available. An implementation of the hint management module integrated in Web proxy software is presented. Experimental evaluation of the implementation demonstrates that the associated overheads are very small.

Collision avoidance and resolution multiple access (CARMA)

June 1998


45 Reads

The collision avoidance and resolution multiple access (CARMA) protocol is presented and analyzed. CARMA uses a collision avoidance handshake in which the sender and receiver exchange a request to send (RTS) and a clear to send (CTS) before the sender transmits any data. CARMA is based on carrier sensing, together with collision resolution based on a deterministic tree-splitting algorithm. For analytical purposes, an upper bound is derived for the average number of steps required to resolve collisions of RTSs using the tree-splitting algorithm. This bound is then applied to the computation of the average channel utilization in a fully connected network with a large number of stations. Under light-load conditions, CARMA achieves the same average throughput as multiple access protocols based on RTS/CTS exchange and carrier sensing. It is also shown that, as the arrival rate of RTSs increases, the throughput achieved by CARMA is close to the maximum throughput that any protocol based on collision avoidance (i.e., RTS/CTS exchange) can achieve if the control packets used to acquire the floor are much smaller than the data packet trains sent by the stations. Simulation results validate the simplifying approximations made in the analytical model. Our analysis results indicate that collision resolution makes floor acquisition multiple access much more effective.

Failure-Atomic File Access in the Slice Interposed Network Storage System

October 2002


52 Reads

This paper presents a recovery protocol for block I/O operations in Slice, a storage system architecture for high-speed LANs incorporating network-attached block storage. The goal of the Slice architecture is to provide a network file service with scalable bandwidth and capacity while preserving compatibility with off-the-shelf clients and file server appliances. The Slice prototype virtualizes the Network File System (NFS) protocol by interposing a request switching filter at the client's interface to the network storage system. The distributed Slice architecture separates functions typically combined in central file servers, introducing new challenges for failure atomicity. This paper presents a protocol for atomic file operations and recovery in the Slice architecture, and related support for reliable file storage using mirrored striping. Experimental results from the Slice prototype show that the protocol has low cost in the common case, allowing the system to deliver client file access bandwidths approaching gigabit-per-second network speeds.

Figure 1. Overview of conventional (A) and virtual (B) file systems. There are two clients (C1, C2) and one server (S). In (A), the NFS clients C1, C2 share a single logical server via a static mount point for all users under /home . In (B), the file account resides in /home/fileA , and two grid users (X, Y) access the file system through shadow accounts 1 and 2, respectively. The virtual file system clients connect to two independent (logical) servers and have dynamic mount points for users inside /home/fileA that are valid only for the duration of a computing session. 
Figure 2. 
Figure 3. Configuration with PVFS file system gateway. Solid lines represent kernel-to-PVFS connections that may involve access to the privileged port of the gateway’s port-mapper; dashed lines represent PVFS-to-PVFS connections through user-level ports. 
Figure 4. Histogram of file system transactions-per-second collected across 500 PUNCH user sessions. The total user time across these sessions is 172 hours. 
Seamless Access to Decentralized Storage Services in Computational Grids via a Virtual File System

April 2004


129 Reads

This paper describes a novel technique for establishing a virtual file system that allows data to be transferred user-transparently and on-demand across computing and storage servers of a computational grid. Its implementation is based on extensions to the Network File System (NFS) that are encapsulated in software proxies. A key differentiator between this approach and previous work is the way in which file servers are partitioned: while conventional file systems share a single (logical) server across multiple users, the virtual file system employs multiple proxy servers that are created, customized and terminated dynamically, for the duration of a computing session, on a per-user basis. Furthermore, the solution does not require modifications to standard NFS clients and servers. The described approach has been deployed in the context of the PUNCH network-computing infrastructure, and is unique in its ability to integrate unmodified, interactive applications (even commercial ones) and existing computing infrastructure into a network computing environment. Experimental results show that: (1) the virtual file system performs well in comparison to native NFS in a local-area setup, with mean overheads of 1 and 18%, for the single-client execution of the Andrew benchmark in two representative computing environments, (2) the average overhead for eight clients can be reduced to within 1% of native NFS with the use of concurrent proxies, (3) the wide-area performance is within 1% of the local-area performance for a typical compute-intensive PUNCH application (SimpleScalar), while for the I/O-intensive application Andrew the wide-area performance is 5.5 times worse than the local-area performance.

Figure 2. Node Traffic Load (λi ) versus Node ID(i) for different string lengths (n)
Performance analysis of ALOHA and p-persistent ALOHA for multi-hop underwater acoustic sensor networks

March 2011


643 Reads

The extreme conditions under which multi-hop underwater acoustic sensor networks (UASNs) operate constrain the performance of medium access control (MAC) protocols. The MAC protocol employed significantly impacts the operation of the network supported, and such impacts must be carefully considered when developing protocols for networks constrained by both bandwidth and propagation delay. Time-based coordination, such as TDMA, have limited applicability due to the dynamic nature of the water channel used to propagate the sound signals, as well as the significant effect of relatively small changes in propagation distance on the propagation time. These effects cause inaccurate time synchronization and therefore make time-based access protocols less viable. The large propagation delays also diminish the effectiveness of carrier sense protocols as they do not predict with any certainty the status of the intended recipients at the point when the traffic would arrive. Thus, CSMA protocols do not perform well in UASNs, either. Reservation-based protocols have seldom been successful in commercial products over the past 50 years due to many drawbacks, such as limited scalability, relatively low robustness, etc. In particular, the impact of propagation delays in UASNs and other such constrained networks obfuscate the operation of the reservation protocols and diminish, if not completely negate, the benefit of reservations. The efficacy of the well-known RTS-CTS scheme, as a reservation-based enhancement to the CSMA protocol, is also adversely impacted by long propagation delays. An alternative to these MAC protocols is the much less complex ALOHA protocol, or one of its variants. However, the performance of such protocols within the context of multi-hop networks is not well studied. In this paper we identify the challenges of modeling contention-based MAC protocols and present models for analyzing ALOHA and p-persistent ALOHA variants for a simple string topology. As expected, an application of the model suggests that ALOHA variants are very sensitive to traffic loads. Indeed, when the traffic load is small, utilization becomes insensible to p values. A key finding, though, is the significance of the network size on the protocols’ performance, in terms of successful delivery of traffic from outlying nodes, indicating that such protocols are only appropriate for very small networks, as measured by hop count. KeywordsUnderwater acoustic sensor networks–MAC–ALOHA– p-persistent ALOHA–Multi-hop

Performance portability on EARTH: A case study across several parallel architectures

June 2007


37 Reads

Due to the increase of the diversity of parallel architectures, and the increasing development time for parallel applications, performance portability has become one of the major considerations when designing the next generation of parallel program execution models, APIs, and runtime system software. This paper analyzes both code portability and performance portability of parallel programs for fine-grained multi-threaded execution and architecture models. We concentrate on one particular event-driven fine-grained multi-threaded execution model—EARTH, and discuss several design considerations of the EARTH model and runtime system that contribute to the performance portability of parallel applications. We believe that these are important issues for future high end computing system software design. Four representative benchmarks were conducted on several different parallel architectures, including two clusters listed in the 23rd supercomputer TOP500 list. The results demonstrate that EARTH based programs can achieve robust performance portability across the selected hardware platforms without any code modification or tuning.

Enabling ad-hoc collaboration between mobile users in the project

March 2007


10 Reads

This paper discusses how ad-hoc collaboration boosts the operation of a set of messengers. This discussion continues the research we earlier initiated in the MESSENGER\mathcal{MESSENGER} project, which develops data management mechanisms for UDDI registries of Web services using mobile users and software agents. In the current operation mode of messengers, descriptions of Web services are first, collected from UDDI registries and later, submitted to other UDDI registries. This submission mode of Web services descriptions does not foster the tremendous opportunities that both wireless technologies and mobile devices offer. When mobile devices are “close” to each other, they can form a mobile ad-hoc network that permits the exchange of data between these devices without any pre-existing communication infrastructure. By authorizing messengers to engage in ad-hoc collaboration, collecting additional descriptions of Web services from other messengers can happen, too. This has several advantages, but at the same time poses several challenges, which in fact highlight the complexity of ad-hoc networks.

Spine routing in ad hoc networks

January 1998


36 Reads

An ad hoc network is a multihop wireless network in which mobile hosts communicate without the support of a wired backbone for routing messages. We introduce a self organizing network structure called a spine and propose a spine-based routing infrastructure for routing in ad hoc networks. We propose two spine routing algorithms: (a) Optimal Spine Routing (OSR), which uses full and up-to-date knowledge of the network topology, and (b) Partial-knowledge Spine Routing (PSR), which uses partial knowledge of the network topology. We analyze the two algorithms and identify the optimality-overhead trade-offs involved in these algorithms.

Foundations of Security for Hash Chains in Ad Hoc Networks

July 2005


34 Reads

Nodes in ad hoc networks generally transmit data at regular intervals over long periods of time. Recently, ad hoc network nodes have been built that run on little power and have very limited memory. Authentication is a significant challenge in ad hoc networks, even without considering size and power constraints. Expounding on idealized hashing, this paper examines lower bounds for ad hoc broadcast authentication for μTESLA-like protocols. In particular, this paper explores idealized hashing for generating preimages of hash chains. Building on Bellare and Rogaway’s classical definition, a similar definition for families of hash chains is given. Using these idealized families of hash chain functions, this paper gives a time-space product Ω(k 2 log 4n) bit operation1 lower-bound for optimal preimage hash chain generation for k constant. This bound holds where n is the total length of the hash chain and the hash function family is k-wise independent. These last results follow as corollaries to a lower bound of Coppersmith and Jakobsson.

PHOENIX: A Self Adaptable Monitoring Platform for Cluster Management

January 2002


17 Reads

Distributed systems based on cluster of workstation are more and more difficult to manage due to the increasing number of processors involved, and the complexity of associated applications. Such systems need efficient and flexible monitoring mechanisms to fulfill administration services requirements. In this paper, we present PHOENIX a distributed platform supporting both applications and operating system monitoring with a variable granularity. The granularity is defined using logical expressions to specify complex monitoring conditions. These conditions can be dynamically modified during the application execution. Observation techniques, based on an automatic probe insertion combined with a system agent to minimize the PHOENIX execution time overhead. The platform extensibility offers a suitable environment to design distributed value added services (performance monitoring, load balancing, accounting, cluster management, etc.).

A Two-level distributed architecture for the support of content adaptation and delivery services

March 2010


181 Reads

The growing demand for Web and multimedia content accessed through heterogeneous devices requires the providers to tailor resources to the device capabilities on-the-fly. Providing services for content adaptation and delivery opens two novel challenges to the present and future content provider architectures: content adaptation services are computationally expensive; the global storage requirements increase because multiple versions of the same resource may be generated for different client devices. We propose a novel two-level distributed architecture for the support of efficient content adaptation and delivery services. The nodes of the architecture are organized in two levels: thin edge nodes on the first level act as simple request gateways towards the nodes of the second level; fat interior clusters perform all the other tasks, such as content adaptation, caching and fetching. Several experimental results show that the Two-level architecture achieves better performance and scalability than that of existing flat or no cooperative architectures. KeywordsContent adaptation-Multimedia resources-Distributed architectures-Performance evaluation

On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring

July 2003


53 Reads

In heterogeneous environments, dynamic scheduling algorithms are a powerful tool towards performance improvement of scientific applications via load balancing. However, these scheduling techniques employ heuristics that require prior knowledge about workload via profiling resulting in higher overhead as problem sizes and number of processors increase. In addition, load imbalance may appear only at run-time, making profiling work tedious and sometimes even obsolete. Recently, the integration of dynamic loop scheduling algorithms into a number of scientific applications has been proven effective. This paper reports on performance improvements obtained by integrating the Adaptive Weighted Factoring, a recently proposed dynamic loop scheduling technique that addresses these concerns, into two scientific applications: computational field simulation on unstructured grids, and N-Body simulations. Reported experimental results confirm the benefits of using this methodology, and emphasize its high potential for future integration into other scientific applications that exhibit substantial performance degradation due to load imbalance.

A mobile agent model for fault-tolerant manipulation on distributed objects

March 2007


17 Reads

In this paper, we discuss how to realize fault-tolerant applications on distributed objects. Servers supporting objects can be fault-tolerant by taking advantage of replication and checkpointing technologies. However, there is no discussion on how application programs being performed on clients are tolerant of clients faults. For example, servers might block in the two-phase commitment protocol due to the client fault. We newly discuss how to make application programs fault-tolerant by taking advantage of mobile agent technologies where a program can move from a computer to another computer in networks. An application program to be performed on a faulty computer can be performed on another operational computer by moving the program in the mobile agent model. In this paper, we discuss a transactional agent model where a reliable and efficient application for manipulating objects in multiple computers is realized in the mobile agent model. In the transactional agent model, only a small part of the application program named routing subagent moves around computers. A routing subagent autonomously finds a computer which to visit next. We discuss a hierarchical navigation map which computer should be visited price to another computer in a transactional agent. A routing subagent makes a decision on which computer visit for the hierarchical navigation map. Programs manipulating objects in a computer are loaded to the computer on arrival of the routing subagent in order to reduce the communication overhead. This part of the transactional agent is a manipulating subagent. The manipulation subagent still exists on the computer even after the routing subagent leaves the computer in order to hold objects until the commitment. We assume every computer may stop by fault while networks are reliable. There are kinds of faulty computers for a transactional agent; current, destination, and sibling computers where a transactional agent now exists, will move, and has visited, respectively. The types of faults are detected by neighbouring manipulation subagents by communicating with each other. If some of the manipulation subagents are faulty, the routing subagent has to be aborted. However, the routing subagent is still moving. We discuss how to efficiently deliver the abort message to the moving routing subagent. We evaluate the transactional agent model in terms of how long it takes to abort the routing subagent if some computer is faulty.

Condor-G: A Computation Management Agent for Multi-Institutional Grids

January 2002


118 Reads

In recent years, there has been a dramatic increase in the number of available computing and storage resources. Yet few tools exist that allow these resources to be exploited effectively in an aggregated form. We present the Condor-G system, which leverages software from Globus and Condor to enable users to harness multi-domain resources as if they all belong to one personal domain. We describe the structure of Condor-G and how it handles job management, resource selection, security, and fault tolerance. We also present results from application experiments with the Condor-G system. We assert that Condor-G can serve as a general-purpose interface to Grid resources, for use by both end users and higher-level program development tools.

Top-cited authors