-
[show abstract]
[hide abstract]
ABSTRACT: Selfish behaviors of individual machines in a grid can potentially damage the performance of the system as a whole. However, scrutinizing the grid by taking into account the noncooperativeness of machines is a largely unexplored research problem. In this paper, we first present a new hierarchical game-theoretic model of the grid that matches well with the physical administrative structure in real-life situations. We then focus on the impact of selfishness in intrasite job execution mechanisms. Based on our novel utility functions, we analytically derive the Nash equilibrium and optimal strategies for the general case. To study the effects of different strategies, we have also performed extensive simulations by using a well-known practical scheduling algorithm over the NAS (numerical aerodynamic simulation) and the PSA (parameter sweep application) workloads. We have studied the overall job execution performance of the grid system under a wide range of parameters. Specifically, we find that the optimal selfish strategy significantly outperforms the Nash selfish strategy. Our performance evaluation results can serve as a valuable reference for designing appropriate strategies in a practical grid
IEEE Transactions on Parallel and Distributed Systems 06/2007; 18(5):621-636. · 1.40 Impact Factor
-
IJCIS. 01/2006; 2:412-433.
-
IEEE Trans. Computers. 01/2006; 55:703-719.
-
[show abstract]
[hide abstract]
ABSTRACT: Large-scale worm outbreaks that lead to distributed denial-of-service attacks pose a major threat to Internet infrastructure security. Fast worm containment is crucial for minimizing damage and preventing flooding attacks against network hosts.
IEEE Security and Privacy Magazine 06/2005; · 0.90 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, our contributions are two-fold: First, we enhance the Min-Min and Sufferage heuristics under three risk modes driven by security concerns. Second, we propose a new Space-Time Genetic Algorithm (STGA) for trusted job scheduling, which is very fast and easy to implement. Under our new model, a job can possibly fail if the site security level is lower than the job security demand. We consider three security-driven heuristic modes: secure, risky, and f-risky. The secure mode always dispatches jobs to secure sites meeting the job security demands. The risky mode allocates jobs to any available resource site, taking whatever the risk it may face. The f-risky mode tries to limit the risk to be at most certain probability f. Our extensive simulation results indicated that the proposed STGA is highly effective in scheduling two types of practical workloads: NAS (Numerical Aerodynamic Simulation) and PSA (parametersweep application). The STGA outperforms the Min-Min and Sufferage heuristics under three risk modes, in terms of a wide range of performance metrics including makespan, average response time, site utilization, slowdown ratio, and job failure rate.
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International; 05/2005
-
[show abstract]
[hide abstract]
ABSTRACT: The USC GridSec project develops distributed security infrastructure and self-defense capabilities to secure wide-area networked
resource sites participating in a Grid application. We report new developments in trust modeling, security-binding methodology,
and defense architecture against intrusions, worms, and flooding attacks. We propose a novel architectural design of Grid
security infrastructure, security binding for enhanced Grid efficiency, distributed collaborative IDS and alert correlation,
DHT-based overlay networks for worm containment, and pushback of DDoS attacks. Specifically, we present a new pushback scheme
for tracking attack-transit routers and for cutting malicious flows carrying DDoS attacks. We discuss challenging research
issues to achieve secure Grid computing effectively in an open Internet environment.
05/2005: pages 121-152;
-
IEEE Internet Computing. 01/2005; 9:24-34.
-
J. Grid Comput. 01/2005; 3:53-73.
-
[show abstract]
[hide abstract]
ABSTRACT: Selfish behaviors of individual machines in a Grid can potentially damage the performance of the system as a whole. However, scrutinizing the Grid by taking into account the non-cooperativeness of machines is a largely unexplored research problem. In this paper, we first present a new hierarchical game-theoretic model of the Grid that matches well with the physical administrative structure in real-life situations. We then focus on the impact of selfishness in intra-site job execution mechanisms. Based on our novel utility functions, we analytically derive the Nash equilibrium and optimal strategies for the general case. To study the effects of different strategies, we have also performed extensive simulations by using a well-known practical scheduling algorithm over the NAS (Numerical Aerodynamic Simulation) and the PSA (Parameter Sweep Application) workloads. We have studied overall job execution performance of the Grid system under a wide range of parameters. Specifically, we find that the Optimal selfish strategy significantly outperforms the Nash selfish strategy. Our performance evaluation results can serve as valuable reference for designing appropriate strategies in a practical Grid.
01/2005;
-
[show abstract]
[hide abstract]
ABSTRACT: How to build the mutual trust among Grid resources sites is crucial to secure distributed Grid applications. We suggest enhancing
the trust index of resource sites by upgrading their intrusion defense capabilities and checking the success rate of jobs
running on the platforms. We propose a new fuzzy-logic trust model for securing Grid resources. Grid security is enforced through trust update, propagation, and integration across sites. Fuzzy
trust integration reduces platform vulnerability and guides the defense deployment across Grid sites. We developed a SeGO
scheduler for trusted Grid resource allocation.
The SeGO scheduler optimizes the aggregate computing power with security assurance under fixed budget constraints. The effectiveness
of the scheme was verified by simulation experiments. Our results show up to 90% enhancement in site security. Compared with
no trust integration, our scheme leads to 114% improvement in Grid performance/cost ratio. The job drop rate reduces by 75%. The utilization of Grid resources increased to 92.6% as more jobs are submitted. These results demonstrate significant performance gains through optimized
resource allocation and aggressive security reinforcement.
10/2004: pages 9-21;
-
[show abstract]
[hide abstract]
ABSTRACT: This paper investigates a novel streaming architecture consisting of home-to-home online (H2O) devices that collaborate with one another to provide on-demand access to large repositories of continuous media such as audio and video clips. An H2O device is configured with a high bandwidth wireless communication component, a powerful processor, and gigabytes of storage. A key challenge of this environment is how to place data across H2O devices in order to enhance startup latency, defined as the delay observed from when a user requests a clip, to the onset of its display. Our primary contribution is a novel replication technique that enhances startup latency, while minimizing the total storage space required from an environment consisting of N H2O devices. This technique is based on the following intuition: The first few blocks of a clip are required more urgently than its last few blocks, and should be replicated more frequently in order to minimize startup latency. We develop analytical models to quantify the number of replicas required for each block. In addition, we describe two alternative distributed implementation of our replication strategy. When compared with full replication, our technique provides on average greater than 97% (i.e., several orders of magnitude) savings in storage space, while ensuring zero startup latency and a hiccup-free reception.
IEEE Transactions on Multimedia 05/2004; · 1.93 Impact Factor
-
Proceedings of the ISCA 17th International Conference on Parallel and Distributed Computing Systems, September 15-17, 2004, The Canterbury Hotel, San Francisco, California, USA; 01/2004
-
[show abstract]
[hide abstract]
ABSTRACT: Realistic platforms for Grid computing face security threats from the network attacks. Heterogeneous clusters in the open Grid are likely working in different autonomous domains (ADs). Grid jobs dispatched across the ADs are thus subject to unexpected failures or long delays due to wide-area insecurity. This hinders Grid job scheduling and outsourcing to remote sites. Unfortunately, this problem was largely ignored in the past. In this paper, we close up the gap by specifying several risk modes to model various levels of risky conditions in Grid sites. Then we propose three resilient strategies: preemptive, replication, and delay-tolerant for designing security-assured heuristic scheduling algorithms. The relative performance of these algorithms is evaluated by the NAS and PSA benchmarks. We measure the makespan, average turnaround time, Grid utilization, slowdown ratio, and job failure rate to evaluate heuristic algorithms. Kiviat graphs are used to demonstrate the highest performance of two delay-tolerant algorithms. Two replication algorithms rank the next, followed by two preemptive algorithms. The conservative algorithm has the lowest performance. These findings suggest that it is more resilient for the global job scheduler to tolerate job delays by calculated risky conditioning, instead of resorting to job preemption, replication, or assuming unrealistic risk-free operations.
-
[show abstract]
[hide abstract]
ABSTRACT: Problem. Large-scale worm outbreak is one of the major security threats to today's Internet. Network worms exploit the vulnerabi-lities of widely deployed homogenous software to self-propagate quickly. Moore et al [3] show that the react time of worm contain-ment is only a few minutes and the signature-based filtering is more efficient than source-address filtering. Recent work by Earlybird [4] and Autograph [1] suggests that it is promising to automatically detect worm signatures by analyzing their content prevalence and address dispersion. However, most scanning worms will be dispersed on the whole Internet when they start to spread. In the early stage of worm spreading, it is difficult to accumulate enough payloads to generate precise signatures in individual edge networks. Approach. To solve this problem, we propose to design and implement a distributed worm signature detection and dissemination system (WormShield) that collaboratively analyzes worm activities in multiple administrative domains. In WormShield, all monitors deployed in edge networks self-organize into a distributed hash table (DHT) overlay network. Instead of only sharing port-scanning information like Autograph, WormShield monitors collaboratively analyze the global prevalence of payload contents and their address dispersion using distributed aggregation trees (DAT) built on top of the Chord [5] overlay. Each monitor first partitions packet payloads into small content blocks using the Rabin footprint algorithm. It then updates the local prevalence for each content block as well as its source and destination addresses. If the local prevalence and address dispersion are greater than its local thresholds, the monitor starts to update the global prevalence and address dispersion for this content block. A root monitor is selected for each content block to aggregate its global prevalence and address dispersion using the same consistent hashing scheme as in Chord. A distributed aggregation tree (DAT) rooted at this monitor will be implicitly constructed by using the Chord routing paths from all other monitors to the root monitor. Since the Chord routing paths are at most O(log N) hops, the height of DAT tree will be at most O(log N) as well. Each monitor in the DAT tree will receive updates from its children and send the aggregated information to its parent. This approach can significantly reduce updating hotspots at root monitors since they only receive updates from their O(log N) direct children instead of all O(N) monitors in the direct update approach. Figure 1 shows an example WormShield network with six nodes. A content block will be identified as a potential worm signature once its global prevalence and address dispersion are greater than the global thresholds. Its root monitor then constructs a multicast tree on top of the Chord overlay and disseminates the signature to all other monitors. The monitors could automatically deploy the received signature in their local signature-based intrusion detection systems, or apply other policies such as notifying local security administrator. Challenges. There are several challenging problems in Worm-Shield design and implementation. First, it is not practical to aggregate the global information for all content blocks seen in the network because of the processing and bandwidth limitations. As showed by Earlybird [4] in a single site, over 97 percent of all content blocks repeat two or fewer times and 94.5 percent are only observed once. Therefore, only 3% of all content blocks are subject to global aggregation if we set the local prevalence threshold to be 2. We suspect that the content prevalence and address dispersion conform to Zipf distribution. We will verify this hypothesis by analyzing real-world trace files.
-
[show abstract]
[hide abstract]
ABSTRACT: Internet catastrophes could be caused by large-scale worm outbreaks that lead to DDoS flooding attacks. Internet worms can be exploited to damage infected hosts and launch flooding attacks against high-profile Internet services. We suggest deploying distributed WormShield monitors to automatically detect and disseminate worm signatures. WormShield monitors analyze the global prevalence and address dispersion of worm signatures, collaboratively, using distributed hash table (DHT) overlays built on top of multiple edge networks. We simulated CodeRed-like worms on an Internet configuration of 105,246 edge networks and 338,562 vulnerable hosts. The results show that collaborative monitors detect worm signatures about 10 times faster than using independent monitors. This results in 27 times reduction of infected hosts as 1% of the vulnerable edge networks are monitored. A low-complexity traffic monitoring scheme is developed to track DDoS flooding attacks caused by worms. The article also assesses several worm research projects in academia and industry.