Conference Paper

BackupIT: An Intrusion-Tolerant Cooperative Backup System

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Abstract Reliable storage of large amounts,of data is always a delicate issue. Availability, efficiency, data integrity, and confidentiality are some,features a data backup,system should provide. At the same time, corporate computers offer spare disk space and unused networking,resources. In this paper, we propose an intrusion-tolerant cooperative backup system that provides a reliable collaborative backup resource by leveraging these independent, distributed re- sources. This system makes,efficient use of network and storage resources through compression, encryption, and ef- ficient verification processes. It also implements a protocol to tolerate Byzantine behaviors, when nodes arbitrarily de- viate from their specifications. Experiments performed,to evaluate the proposal showed,its viability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The definition of peers with specific functionality in the network differs from some proposals for P2P systems, where which each peer should be able to play all the roles, thus promoting the idea of a DHT (Distributed Hash Table) [13,11]. However, the implementation using a DHT in its essence is quite costly and difficult to scale [12]. ...
Conference Paper
Full-text available
Cloud computing is a computing model where hardware, platforms and software are seen as services; viz. Infrastructure as a Service, Platform as a Service, and Software as a Service, respectively. Data as a Service (DaaS) is based on the concept that the product, data in this case, can be provided on demand to the user, regardless of geographic or organizational separation between provider and consumer. DaaS applications are for the most part based on excessive data replication in order to guarantee data availability, which means excessive costs in hardware investments. This white paper presents the specification, implementation and evaluation of a system called USTO.RE which aims to be an effective and low-cost alternative for storing data, thereby mitigating the problem of excessive data replication and thus allows itself to be considered a reliable platform from the perspective of data availability. Evaluation scenarios and the results achieved in our experiments to evaluate the system as well as possible lines for future development will be presented.
... Com o aumento da quantidade de dados produzida pelos usuários domésticos através do uso de maquinas fotográficas, pen drivers, musicas, filmes e outros dispositivos produtores de recursos multimídia, bem como pelos sistemas em produção nas empresas que necessitam armazenar dados históricos, surge a necessidade de plataformas de retenção de dados [1,2]. ...
Conference Paper
Full-text available
This article presents the specification and implementation of a tool that aims to be an alternative to the problem of excessive data replication in order to be considered a reliable platform from the perspective of data availability. The tool works with the concept of data federation. The validation scenarios are presented, along with the results achieved.
Chapter
Full-text available
With increasing connectivity speed and Web systems evolution, emerges the Internet systems, which are more commonly called cloud computing. It´s designates a support platform that provides: management, on-demand use, fitness requirements, rational use of resources and automation of processes related to creation of infrastructure. On this context emerge systems of data storage in the cloud, database scalability and data search and retrieval now called as BIGDATA. This short course addresses these exemplifying how we implement and use such platforms.
Article
Full-text available
Storage capacity, like computing power, follows its Moore's law and grows dramat-ically. Consequently, the need for data backup services increases. The domain of data backup was recently hit by the peer-to-peer wave: several projects propose to use a cooperative approach to the backup problem. In this paper we survey these projects, the various problems they face and the techniques they propose to tackle them.
Conference Paper
Full-text available
This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both when selfish nodes deviate from their specification to increase their local benefit. The paper makes three contributions: (1) It introduces the BAR (Byzantine, Altruistic, Rational) model as a foundation for reasoning about cooperative services; (2) It proposes a general three-level architecture to reduce the complexity of building services under the BAR model; and (3) It describes an implementation of BAR-B the first cooperative backup service to tolerate both Byzantine users and an unbounded number of rational users. At the core of BAR-B is an asynchronous replicated state machine that provides the customary safety and liveness guarantees despite nodes exhibiting both Byzantine and rational behaviors. Our prototype provides acceptable performance for our application: our BAR-tolerant state machine executes 15 requests per second, and our BAR-B backup service can back up 100MB of data in under 4 minutes.
Article
Full-text available
The consensus problem involves an asynchronous system of processes, some of which may be unreliable. The problem is for the reliable processes to agree on a binary value. In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the “Byzantine Generals” problem.
Conference Paper
Full-text available
Replication is a mechanism extensively used to guarantee the availability and good performance of data storage services. Byzantine Quorum Systems (BQS) have been proposed as a solution to guarantee the consistency of that kind of services, even if some of the replicas fail arbitrarily. Many BQS have been proposed recently, but comparing their performance is not simple. In fact, it has been shown that theoretical metrics like the number of steps or communication rounds say as much about the practical performance of distributed algorithms as they hide. This paper presents a comparative evaluation of several BQS algorithms in the literature. The evaluation is based both on experiments and simulations. For that purpose, a framework for evaluating BQS called BQSNeko was developed. The results of the evaluation allow a better understanding of the algorithms and the tradeoffs involved.
Article
Backup is cumbersome and expensive. Individual users almost never back up their data, and backup is a significant cost in large organizations. This paper presents Pastiche , a simple and inexpensive backup system. Pastiche exploits excess disk capacity to perform peer-to-peer backup with no administrative costs. Each node minimizes storage overhead by selecting peers that share a significant amount of data. It is easy for common installations to find suitable peers, and peers with high overlap can be identified with only hundreds of bytes. Pastiche provides mechanisms for confidentiality, integrity, and detection of failed or malicious peers. A Pastiche prototype suffers only 7.4% overhead for a modified Andrew Benchmark, and restore performance is comparable to cross-machine copy.
Article
This paper presents the design and implementation of a cooperative off-site backup system, Venti-DHash. Venti- DHash is based on a DHT infrastructure and is designed to support recovery of data after a disaster by keeping regular snapshots of file systems distributed off-site, on peers on the Internet. Whereas conventional backup systems incur significant equipment costs, manual effort and high ad- ministrative overhead, we hope that a distributed backup system can alleviate these problems, making backups easy and feasible. By building this system on top of a DHT, the backup application inherits the properties of the DHT, and serves to evaluate the feasibility of using a DHT to build large scale applications.
Article
Most structured peer-to-peer overlays rely on consistent hashing to determine the node that is responsible for a given key. For consistent hashing to work properly, it is necessary that the nodes have a consistent view of their neighborhood in the identifier space. However, if routing anomalies occur in the underlying network, this view can become inconsistent, causing unstable overlay behavior and, worse, allowing more than one node to assume re-sponsibility for ranges of keys. We present a set of techniques for preventing in-consistencies under routing anomalies, and we propose to adopt strategies from mobile ad-hoc networking for maintaining connectivity in the presence of path fail-ures. We evaluate our design in the context of Pastry and present results from a deployment in the PlanetLab testbed.
Article
Many personal computers are operated with no backup strategy for protecting data in the event of loss or failure. At the same time, PCs are likely to contain spare disk space and unused networking re-sources. We present the Apportioned Backup System (ABS), which provides a reliable collaborative backup resource by leveraging these independent, distributed resources. With ABS, procuring and maintaining spe-cialized backup hardware is unnecessary. ABS makes efficient use of network and storage resources through use of coding techniques, convergent encryption and storage, and efficient versioning and verification pro-cesses. The system also painlessly accommodates dy-namic expansion of system compute, storage, and net-work resources, and is tolerant of catastrophic node failures.
Article
A major hurdle to deploying a distributed storage infras-tructure in peer-to-peer systems is storing data reliably using nodes that have little incentive to remain in the sys-tem. We argue that a node should choose its neighbors (the nodes with which it shares resources) based on ex-isting social relationships instead of randomly. This ap-proach provides incentives for nodes to cooperate and results in a more stable system which, in turn, reduces the cost of maintaining data. The cost of this approach is decreased flexibility and storage utilization. We describe our approach and sketch two applications for which this approach is viable: a cooperative backup system and a Usenet replacement.
Article
Quorum systems are well-known tools for ensuring the consistency and availability of replicated data despite the benign failure of data repositories. In this paper we consider the arbitrary (Byzantine) failure of data repositories and present the first study of quorum system requirements and constructions that ensure data availability and consistency despite these failures. We also consider the load associated with our quorum systems, i.e., the minimal access probability of the busiest server. For services subject to arbitrary failures, we demonstrate quorum systems over servers with a load of , thus meeting the lower bound on load for benignly fault-tolerant quorum systems. We explore several variations of our quorum systems and extend our constructions to cope with arbitrary client failures.
Conference Paper
Byzantine quorum systems have been proposed that work properly even when up to f replicas fail arbitrarily. However, these systems are not so successful when confronted with Byzantine faulty clients. This paper presents novel protocols that provide atomic semantics despite Byzantine clients. Our protocols prevent Byzantine clients from interfering with good clients: bad clients cannot prevent good clients from completing reads and writes, and they cannot cause good clients to see inconsistencies. In addition we also prevent bad clients that have been removed from operation from leaving behind more than a bounded number of writes that could be done on their behalf by a colluder. Our protocols are designed to work in an asynchronous system like the Internet and they are highly efficient. We require 3f +1 replicas, and either two or three phases to do writes; reads normally complete in one phase and require no more than two phases, no matter what the bad clients are doing. We also present strong correctness conditions for systems with Byzantine clients that limit what can be done on behalf of bad clients once they leave the system. Furthermore we prove that our protocols are both safe (they meet those conditions) and live.
Conference Paper
Quorum systems are well-known tools for ensuring the consistency and availability of replicated data despite the benign failure of data repositories. In this paper we consider the arbitrary (Byzantine) failure of data repositories and present the first study of quorum system requirements and constructions that ensure data availability and consistency despite these failures. We also consider the load associated with our quorum systems, i.e., the minimal access probability of the busiest server. For services subject to arbitrary failures, we demonstrate quorum systems over n servers with a load of O(1/√n), thus meeting the lower bound on load for benignly fault-tolerant quorum systems. We explore several variations of our quorum systems and extend our constructions to cope with arbitrary client failures.
Conference Paper
This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications. Pastry performs application-level routing and object location in a potentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.
Conference Paper
Contributory applications allow users to donate unused resources on their personal computers to a shared pool. Applications such as SETI@home, Folding@home, and Freenet are now in wide use and provide a variety of ser- vices, including data processing and content distribution. However, while several research projects have proposed contributory applications that support peer-to-peer stor- age systems, their adoption has been comparatively lim- ited. We believe that a key barrier to the adoption of contributory storage systems is that contributing a large quantity of local storage interferes with the principal user of the machine. To overcome this barrier, we introduce the Transparent File System (TFS). TFS provides background tasks with large amounts of unreliable storage—all of the currently available space—without impacting the performance of ordinary file access operations. We show that TFS al- lows a peer-to-peer contributory storage system to pro- vide 40% more storage at twice the performance when compared to a user-space storage mechanism. We an- alyze the impact of TFS on replication in peer-to-peer storage systems and show that TFS does not appreciably increase the resources needed for file replication.
Conference Paper
this paper, we explore the feasibility of using data redundancy,a model of dependent host vulnerabilities, anddistributed storage to tolerate such events. In particular,we motivate the design of a cooperative, distributed remotebackup system called the Phoenix recovery system.The usage model of Phoenix is straightforward: a userspecify an amount of bytes from its disk space the systemcan use, and the goal of the system is to protect a proportionalof its data using storage providedby...
Conference Paper
Backup is cumbersome and expensive. Individual users almost never back up their data, and backup is a significant cost in large organizations. This paper presents Pastiche, a simple and inexpensive backup system. Pastiche exploits excess disk capacity to perform peer-to-peer backup with no administrative costs. Each node minimizes storage overhead by selecting peers that share a significant amount of data. It is easy for common installations to find suitable peers, and peers with high overlap can be identified with only hundreds of bytes. Pastiche provides mechanisms for confidentiality, integrity, and detection of failed or malicious peers. A Pastiche prototype suffers only 7.4% overhead for a modified Andrew Benchmark, and restore performance is comparable to cross-machine copy.
Conference Paper
Backup is cumbersome. To be effective, backups have to be made at regular intervals, forcing users to organize and store a growing collection of backup media. In this paper we propose a novel peer-to-peer backup system, PeerStore, that allows the user to store his backups on other people's computers instead. PeerStore is an adaptive, cost-effective system suitable for all types of networks ranging from LAN, WAN to large unstable networks like the Internet. The system consists of two layers: metadata layer and symmetric trading layer. Locating blocks and duplicate checking is accomplished by the metadata layer while the actual data distribution is done between pairs of peers after they have established a symmetric data trade. By decoupling the metadata management from data storage, the system offers a significant reduction of the maintenance cost and preserves fairness among peers. Results show that PeerStore has a reduced maintenance cost comparing to pStore. PeerStore also realizes fairness because of the symmetric nature of the trades.
Article
Peer-to-peer storage systems assume that their users consume resources in proportion to their contribution. Unfortunately, users are unlikely to do this without some enforcement mechanism. Prior solutions to this problem require centralized infrastructure, constraints on data placement, or ongoing administrative costs. All of these run counter to the design philosophy of peer-to-peer systems. requiring trusted third parties, symmetric storage relationships, monetary payment, or certified identities. Each peer that requests storage of another must agree to hold a claim in return---a placeholder that accounts for available space. After an exchange, each partner checks the other to ensure faithfulness. Samsara punishes unresponsive nodes probabilistically. Because objects are replicated, nodes with transient failures are unlikely to suffer data loss, unlike those that are dishonest or chronically unavailable. Claim storage overhead can be reduced when necessary by forwarding among chains of nodes, and eliminated when cycles are created. Forwarding chains increase the risk of exposure to failure, but such risk is modest under reasonable assumptions of utilization and simultaneous, persistent failure.
Article
We present a novel peer-to-peer backup technique that allows computers connected to the Internet to back up their data cooperatively: Each computer has a set of partner computers, which collectively hold its backup data. In return, it holds a part of each partner's backup data. By adding redundancy and distributing the backup data across many partners, a highly-reliable backup can be obtained in spite of the low reliability of the average Internet machine.
Article
In an e#ort to combine research in peer-to-peer systems with techniques for incremental backup systems, we propose pStore: a secure distributed backup system based on an adaptive peer-to-peer network. pStore exploits unused personal hard drive space attached to the Internet to provide the distributed redundancy needed for reliable and effective data backup. Experiments on a 30 node network show that 95% of the files in a 13 MB dataset can be retrieved even when 7 of the nodes have failed. On top of this reliability, pStore includes support for file encryption, versioning, and secure sharing. Its custom versioning system permits arbitrary version retrieval similar to CVS. pStore provides this functionality at less than 10% of the network bandwidth and requires 85% less storage capacity than simpler local tape backup schemes for a representative workload.
Article
this paper, we explore the feasibility of using data redundancy, a model of dependent host vulnerabilities, and distributed storage to tolerate such events. In particular, we motivate the design of a cooperative, distributed remote backup system called the Phoenix recovery system. The usage model of Phoenix is straightforward: a user specify an amount of bytes from its disk space the system can use, and the goal of the system is to protect a proportional of its data using storage provided by other hosts
Conference Paper
This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications. Pastry performs application-level routing and object location in a potentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties
DIBS: distributed backup for local area networks. Technical report, Parallel & Dis-tributed Operating Systems Group
  • E Hsu
  • J Mellen
  • P Naresh
E. Hsu, J. Mellen, and P. Naresh. DIBS: distributed backup for local area networks. Technical report, Parallel & Dis-tributed Operating Systems Group, MIT, 2004.
pStore: A secure peer-to-peer backup system
  • C Batten
  • K Barr
  • A Saraf
  • S Treptin
C. Batten, K. Barr, A. Saraf, and S. Treptin. pStore: A secure peer-to-peer backup system. Technical Report TM-632, MIT Laboratory for Computer Science, 2002.
Pastiche: making backup cheap and easy
  • L P Cox
  • C D Murray
  • B D Noble
L. P. Cox, C. D. Murray, and B. D. Noble. Pastiche: making backup cheap and easy. SIGOPS Operating Systems Review, 36:285-298, 2002.
F2F: Reliable storage in open networks
  • J Li
  • F Dabek
J. Li and F. Dabek. F2F: Reliable storage in open networks. In Intl Workshop on Peer-to-Peer Systems, Santa Barbara CA, Feb. 2006.
A cooperative internet backup scheme
  • M Lillibridge
  • S Elnikety
  • A Birrell
  • M Burrows
  • M Isard
M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows, and M. Isard. A cooperative internet backup scheme. In USENIX Annual Technical Conference, 2003.
A DHT-based backup system
  • E Sit
  • J Cates
  • R Cox
E. Sit, J. Cates, and R. Cox. A DHT-based backup system. In 1st IRIS Student Workshop, 2003.