Questions and Answers (1) View all
-
Answer added in Computer Science5 compare the best JournalSanjay Bansalboth journal are emonent journalboth journal are emonent journalFollowing
Publications (6) View all
-
Article: An Improved Multiple Faults Reassignment based Recovery in Cluster Computing
Sanjay Bansal, Sanjeev Sharma[show abstract] [hide abstract]
ABSTRACT: In case of multiple node failures performance becomes very low as compare to single node failure. Failures of nodes in cluster computing can be tolerated by multiple fault tolerant computing. Existing recovery schemes are efficient for single fault but not with multiple faults. Recovery scheme proposed in this paper having two phases; sequentially phase, concurrent phase. In sequentially phase, loads of all working nodes are uniformly and evenly distributed by proposed dynamic rank based and load distribution algorithm. In concurrent phase, loads of all failure nodes as well as new job arrival are assigned equally to all available nodes by just finding the least loaded node among the several nodes by failure nodes job allocation algorithm. Sequential and concurrent executions of algorithms improve the performance as well better resource utilization. Dynamic rank based algorithm for load redistribution works as a sequential restoration algorithm and reassignment algorithm for distribution of failure nodes to least loaded computing nodes works as a concurrent recovery reassignment algorithm. Since load is evenly and uniformly distributed among all available working nodes with less number of iterations, low iterative time and communication overheads hence performance is improved. Dynamic ranking algorithm is low overhead, high convergence algorithm for reassignment of tasks uniformly among all available nodes. Reassignments of failure nodes are done by a low overhead efficient failure job allocation algorithm. Test results to show effectiveness of the proposed scheme are presented.02/2011; -
Article: An Overview of Portable Distributed Techniques
Sanjay Bansal, Nirved Pandey[show abstract] [hide abstract]
ABSTRACT: In this paper, we reviewed of several portable parallel programming paradigms for use in a distributed programming environment. The Techniques reviewed here are portable. These are mainly distributing computing using MPI pure java based, MPI native java based (JNI) and PVM. We will discuss architecture and utilities of each technique based on our literature review. We explored these portable distributed techniques in four important characteristics scalability, fault tolerance, load balancing and performance. We have identified the various factors and issues for improving these four important characteristics.CoRR. 01/2011; abs/1101.2573. -
Chapter: A Novel Stair-Case Replication (SCR) Based Fault Tolerance for MPI Applications
Sanjay Bansal, Sanjeev Sharma, Ishita Trivedi[show abstract] [hide abstract]
ABSTRACT: When computational clusters increase in size, their mean time to failure reduces drastically. We generally use checkpoint to minimize the loss of computation. Most check pointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of check pointing, while also proving to be too expensive for dedicated check pointing networks and storage systems. We propose a Stair-Case Replication (SCR) Based MPI check pointing facility. Our reference implementation is based on LAM/MPI; however, it is directly applicable to any MPI implementation. We use the staircase method of fault-tolerant MPI with asynchronous replication, eliminating the need for central or network storage. We evaluate centralized storage, a Sun-X4500-based solution, an EMC storage area network (SAN), and the Ibrix commercial parallel file system and show that they are not scalable, particularly after 64 CPUs. We use the staircase MPI method which allows the access point in a lower complexity level to the higher complexity level which improves the efficiency of the previous method. Keywordscheck pointing–Fault tolerance–MPI–SAN12/2010: pages 445-448; -
SourceAvailable from: cisjournal.org
Article: A Multiple Fault Tolerant Approach with Improved Performance in Cluster Computing
[show abstract] [hide abstract]
ABSTRACT: In case of multiple node failures performance is very low as compare to single node failure. Failures of nodes in cluster computing can be tolerated by multiple fault tolerant computing. In this paper, we propose a multiple fault tolerant technique with improved failure detection and performance. Failure detection is done by improved adaptive heartbeats based algorithm to improve the degree of confidence and accuracy. Failure recovery is based on reassignment of load with a rank based algorithm Performance is achieved by distributing the load among all available nodes with dynamic rank based balancing algorithm. Dynamic ranking algorithm is low overhead algorithm for reassignment of tasks uniformly among all available nodes. Message logging is used to recover message loss. -
SourceAvailable from: ijidcs.org
Article: A Detailed Review of Fault-Tolerance Techniques in Distributed System
Sanjay Bansal, Sanjeev Sharma, Ishita Trivedi[show abstract] [hide abstract]
ABSTRACT: In this paper, we give a survey on various fault tolerance techniques and related issues in distributed systems. More specially speaking, we talk about two most important issues; multiple fault handling capability and performance. This survey provides the related research results and also explored the future directions about fault tolerance techniques, and it is a good reference for researcher.IJIDCS) International Journal on Internet and Distributed Computing Systems. 1.