Conference Paper

# Pastry: Scalable, Decentralized Object Location and Routing for Large-Scale Peer-to-Peer Systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications. Pastry performs application-level routing and object location in a potentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.

## No full-text available

... It means that if at the beginning number of resource types present is small, only the first few of the residue classes will be used initially for addressing; and as new clusters are formed in future with new resource types in the system, more residue classes in sequence will be available for their addressing. For example, say initially n is set at 1000; so, there are 1000 possible residue classes, starting from [0], [1], [2], [4], [5], …., [999]. If initially there are only three clusters of peers present with three distinct resource types, the residue classes [0], [1], [2] will be used for addressing the peers in the three respective clusters. ...
... If initially there are only three clusters of peers present with three distinct resource types, the residue classes [0], [1], [2] will be used for addressing the peers in the three respective clusters. If later two new clusters are formed with two new resource types, the residue classes [3] and [4] will be used for addressing the pears in the two new clusters in sequence of their joining the system. Moreover, as we see, there is no limit on the size of any cluster because any residue class can be used to address logically up to infinite number of peers with a common interest. ...
... Earlier we have mentioned that the main objective of the present work is to show the superiority of our non-DHT and interest-based architecture over DHTbased architectures from the viewpoints of search latency and data look up complexity. Therefore, in addition to the analytical comparison, we have performed three experiments to compare the data lookup latency in terms of overlay hops of the Pyramid tree p2p architecture with those of the three prominent p2p architectures, viz., CAN [3], Pastry [4] and Chord [5], Results of the experiments with three different numbers of distinct resource types are shown in Figures 3,4, and 5. In Fig. 3 we have considered pyramid tree overlay networks with 15 distinct resource types; number of peers in each of the 15 clusters corresponding to the 15 unique resource types has been increased gradually resulting finally in a total of 10 26 number of peers in the system. ...
Article
Full-text available
In this paper, we have considered a recently reported 2-layer non-DHT-based structured P2P network. Residue Class based on modular arithmetic has been used to realize the overlay topology. At the heart of the architecture (layer-1), there exists a tree like structure, known as pyramid tree. It is not a conventional tree. A node i in this tree represents the cluster-head of a cluster of peers which are interested in a particular resource of type Ri (i.e. peers with a common interest). The cluster-head is the first among these peers to join the system. Root of the tree is assumed to be at level 1. Such a tree is a complete one if at each level j, there are j number of nodes. It is an incomplete one if only at its leaf level, say k, there are less than k number of nodes. Layer 2 consists of the different clusters. The network has some unique structural properties, e.g. each cluster has a diameter of only 1 overlay hop and the diameter of the network is just (2+2d); d being the number of levels of the layer-1 pyramid tree and d depends only on the number of distinct resources. Therefore, the diameter of the network is independent of the number of peers in the whole network. In the present work, we have used some such properties to design low latency intra and inter cluster data lookup protocols. Our choice of considering non-DHT and interest-based overlay networks is justified by the following facts: 1) intra-cluster data lookup protocol has constant complexity and complexity of inter-cluster data lookup is O(d) if tree traversal is used and 2) search latency is independent of the total number of peers present in the overlay network unlike any structured DHT-based network (as a matter fact unlike any existing P2P network, structured or unstructured). Experimental results as well show superiority of the proposed protocols to some noted structured networks from the viewpoints of search latency and complexity involved in it. In addition, we have presented in detail the process of handling churns and proposed a simple yet very effective technique related to cluster partitioning, which, in turn, helps in reducing the number of messages required to be exchanged to handle churns.
... With the aid of vector space semantics, SemanticPeer departs from ID-centric file sharing to loose semantic coupling in DHTs. In particular, we replace conventional logical key spaces, underlying DHTs, such as Chord [15], Pastry [16], CAN [17], and Kademlia [18], with a high-dimensional semantic vector space. We propose techniques to partition the space and to dynamically assign the resulting partitions to nodes in a structured peer-to-peer network. ...
... In this approach, publishers and subscribers have to establish an explicit agreement on the terms used to describe exchanged content. Structured peer-to-peer solutions, such as Meghdoot [11] and Hermes [12], leverage the exact-match routing primitives of their underlying DHTs (i.e., CAN [17] and Pastry [16] respectively) to map exact-match subscriptions and events to the same nodes in the overlay network. ...
... As can be observed in Fig. 11, as the value of k increases, matching recall follows and nears the upper bound for k = 20. We compare our result with an implementation of the mapping tables approach on the basis of theoretical guarantees on Lookup accuracy as presented in the Pastry DHT [16], [42]. We perform exact match key-based Lookup operations for the top − k most related terms as reported in a local Mapping Table. ...
Article
Full-text available
The decentralized and highly scalable nature of structured peer-to-peer networks, based on distributed hash tables (DHTs), makes them a great fit for facilitating the interaction and exchange of information between dynamic and geographically dispersed autonomous entities. The recent emergence of multimedia-based services and applications in the Internet of Things (IoT) has led to a noticeable shift in the type of data traffic generated by sensing devices from structured textual and numerical content to unstructured and bulky multimedia content. The wide semantic spectrum of human recognizable concepts that can be stemmed from multimedia data, e.g., video and audio, introduces a very large semantic content space. The scale of the content space poses a semantic boundary between data consumers and producers in large-scale peer-to-peer publish/subscribe systems. The exact-match query model of DHTs falls short when participants use different terms to describe the same semantic concepts. In this work, we present OpenPubSub, a peer-to-peer content-based approximate semantic publish/subscribe system. We propose a hybrid event routing model that combines rendezvous routing and gossiping over a structured peer-to-peer network. The network is built on the basis of a high-dimensional semantic vector space as opposed to conventional logical key spaces. We propose methods to partition the space, construct a semantic DHT via bootstrapping, perform approximate semantic lookup operations, and cluster nodes based on their shared interests. Results show that for an approximate event matching upper bound recall of 56.7%, rendezvous-based routing achieves up to 54% recall while decreasing the messaging overhead by 44%, whereas, the hybrid routing approach achieves up to 43.8% recall while decreasing the messaging overhead by 59%.
... In this work, we use a P2P network formed by a set of peers (nodes) that communicate to each other using a Distributed Hash Table (DHT)-based system [49] as a routing infrastructure. Peers build an overlay network managed by the Internet service provider (ISP). ...
... Different protocols like Chord [8] or Pastry [49] can be used for the implementation of the DHT. In this work, we design our solution with the Pastry protocol [49], but it can be applied to other DHT realizations. ...
... Different protocols like Chord [8] or Pastry [49] can be used for the implementation of the DHT. In this work, we design our solution with the Pastry protocol [49], but it can be applied to other DHT realizations. Each peer in Pastry has a unique identifier (peerID) in a circular space of 128-bit identifiers generated using a cryptographic hash SHA-1. ...
Article
Full-text available
We present a distributed platform aimed to process photos taken after a natural disaster strikes by people witnesses of the situation. These photos have to be processed as quickly as possible to collect statistical data used by the decision makers to coordinate rescue teams. A photo can be classified using a predefined taxonomy such as infrastructure and service, affected people, emotional support, among others. Some photos can be classified automatically while other photos require human intervention. The proposed platform is organized in three layers: an architecture, a communication pattern algorithm and optimization modules. The architecture is based on a community of digital volunteers forming a peer-to-peer network. The digital volunteers receive photos from a centralized server that collects and integrates the results into the management process to improve the general understanding of the situation or rescue actions. We present three communication pattern algorithms that define the flow of tasks between the volunteers and the server. The first algorithm is based on point-to-point communication and the other two algorithms use cache techniques inside the peer-to-peer network. Our proposal is devised for short term campaigns and we aim to speed-up the image processing process, to reduce the workload of the server and to reduce communication latency between the server and the volunteers. We evaluate our proposed platform under highly demanding task traffic rates. We analyze the impact of the input parameters of each communication pattern algorithm. We evaluate the performance of our proposed platform with different approaches presented in the technical literature which are deployed as optimization modules. Results show that the performance of the platform when using the cache-based communication pattern algorithms can outperform the one-to-one communication algorithm under high task traffic rates.
... But structured P2P networks have a significant network overlay (deterministic topology) imposed over the participating nodes. The network topology is tightly controlled using DHT (Distributed Hash Table) [3] based protocols like Chord [4,5], Pastry [6], Tapestry [7], CAN [8], etc. Nodes and files are placed at specified locations using DHT [3] over single address space. The participating nodes and resources are mapped on the same identifier space using SHA-1 (Standard Hash Algorithm). ...
... The participating nodes and resources are mapped on the same identifier space using SHA-1 (Standard Hash Algorithm). DHT [3] based protocols like Chord [4,5], Pastry [6], Tapestry [7], CAN [8], BSRE (Binary Search Routing Equivalent) [9], etc. provide resource placement and retrieval mechanism in the structured P2P systems. These protocols form a circular overlay over the participating nodes. ...
... After each joining or leaving, an additional cost of transferring keys Ο(1/N) is imposed. The basic service provided by the structured P2P networks is the resource lookup service, which is implemented by DHT [3] based protocols like Chord [4,5], Pastry [6], Tapestry [7], CAN [8], etc. As reported in [14], Chord [4,5], Pastry [6], and Tapestry [7] have finger table of size Ο(log N) and network diameter of size Ο(log N). ...
Article
Full-text available
A P2P (peer-to-peer) network is a distributed system dependent on the IP-based networks, where independent nodes join and leave the network at their drive. The files (resource) are shared in distributed manner and each participating node ought to share its resources. Some files in P2P networks are accessed frequently by many users and such files are called popular files. Replication of popular files at different nodes in structured P2P networks provides significant reduction in resource lookup cost. Most of the schemes for resource access in the structured P2P networks are governed by DHT (Distributed Hash Table) or DHT-based protocols like Chord. Chord protocol is well accepted protocol among structured P2P networks due to its simple notion and robust characteristics. But Chord or other resource access protocols in structured P2P networks do not consider the cardinality of replicated files to enhance the lookup performance of replicated files. In this paper, we have exploited the cardinality of the replicated files and proposed a resource cardinality-based scheme to enhance the resource lookup performance in the structured P2P networks. We have also proposed the notion of trustworthiness factor to judge the reliability of a donor node. The analytical modelling and simulation analysis indicate that the proposed scheme performs better than the existing Chord and PCache protocols.
... Peer churn ensures the ability to add or remove nodes from a network. Undoubtedly, DHT is capable of extensive data management [2]. Starting from basic key look-up, it gradually increases to support query for multiple data points [3], [4]. ...
... Distributed Hash Table (DHT) is adopted by the researchers at a very earlier stage. Chord, Pastry,Organic food, Tapestry, FissionE, and CAN are popular among them [1], [2], [12]- [17]. Using DHT, Li et al., proposed congestion-free p2p network [14]. ...
... Using DHT, Li et al., proposed congestion-free p2p network [14]. Moreover, Rowstron et al., showed the capability of DHT using their proposed model Pastry through which adaptable decentralized network can be achieved [2]. Key-data pair is used to build the network and accordingly, the architecture is capable of answering queries, which is visible through the work of Stoica and his team [1]. ...
Conference Paper
Full-text available
Although various techniques exist to build and utilize peer to peer (P2P) networks, researchers are working towards finding more flexible solutions in case of range queries. In this work, we propose our model DBST, which uses Binary Search Tree techniques to create and maintain the P2P network. Nonetheless, we are able to apply the same search technique for range query on top of structured overlays. Range query is undoubtedly a significant recipe to process large information transfer among most P2P networks. We contribute in this genre by providing a methodology for the creation of network, insertion of nodes, deletion techniques for nodes, management of routing table, searching through distributed keys, and searching in case of multiple attributes. Furthermore, we provide an explanation on fast searching strategy with the use of our proposed methodology. We propose an extended structure for the routing table, which enables long jumps to produce fast results. Our proposed model is effective for managing multi-attribute range queries and fast searching with cost of O(log n + k). This mechanism can be integrated with the conjunction of Internet of Things (IoT) applications and mobile P2P networks.
... Typically in DHTs, such distributed put and get queries are performed with a message complexity of O(log n). Common examples of DHT overlays are Chord [1], Pastry [2], and Kademlia [3]. Due to their scalability, fault tolerance, fast searching, correctness under concurrency, and load balancing, DHTs are widely used in various advanced distributed system technologies such as edge and fog computing [4]- [9], cloud computing This work has been submitted to the IEEE for possible publication. ...
... Furthermore, we identify open problems and discuss future research guidelines for each distributed system domain. Moreover, we present the principles behind various types of DHTs in terms of their architecture, construction and routing, covering the most well-known DHTs (i.e., Chord [1], Kademlia [3], and Pastry [2]), as well as the least surveyed ones (i.e., Skip Graph [101] and Cycloid [102]). ...
... In this section, we present an architectural overview of the most typical DHT overlays (i.e., Chord [1], Kademlia [3], and Pastry [2]), as well as some of the least surveyed ones (i.e., Skip Graph [101] and Cycloid [102]). We model a DHT as a distributed key-value store of entities. ...
Preprint
Full-text available
Several distributed system paradigms utilize Distributed Hash Tables (DHTs) to realize structured peer-to-peer (P2P) overlays. DHT structures arise as the most commonly used organizations for peers that can efficiently perform crucial services such as data storage, replication, query resolution, and load balancing. With the advances in various distributed system technologies, novel and efficient solutions based on DHTs emerge and play critical roles in system design. DHT-based methods and communications have been proposed to address challenges such as scalability, availability, reliability and performance, by considering unique characteristics of these technologies. In this article, we propose a classification of the state-of-the-art DHT-based methods focusing on their system architecture, communication, routing and technological aspects across various system domains. To the best of our knowledge, there is no comprehensive survey on DHT-based applications from system architecture and communication perspectives that spans various domains of recent distributed system technologies. We investigate the recently emerged DHT-based solutions in the seven key domains of edge and fog computing, cloud computing, blockchain, the Internet of Things (IoT), Online Social Networks (OSNs), Mobile Ad Hoc Networks (MANETs), and Vehicular Ad Hoc Networks (VANETs). In contrast to the existing surveys, our study goes beyond the commonly known DHT methods such as storage, routing, and lookup, and identifies diverse DHT-based solutions including but not limited to aggregation, task scheduling, resource management and discovery, clustering and group management, federation, data dependency management, and data transmission. Furthermore, we identify open problems and discuss future research guidelines for each domain.
... The resulted overlay and its features are used in different fields such as decentralised resource discovery models [5], [6] as an alternative to the trusted third party schemes. There are a number of implementations such as Pastry [7], Chord [8], Kademila [9] and Tapestry [10] that can be used to create a DHT overlay in a given environment. There is no centralized organizing entity that controls the joining and leaving processes of the nodes. ...
... This feature makes the DHT scalable and efficient when it implements in large scale networks. However, ignoring the physical location of nodes during overlay creation [7]- [10] adds a significant delay to the applications that run over it [5], [6]. To address this issue some researchers proposed solutions that removes this mismatch by creating a connection between the underlay and the overlay. ...
Conference Paper
In edge/fog computing infrastructures, the resources and services are offloaded to the edge and computations are distributed among different nodes instead of transmitting them to a centralized entity. Distributed Hash Table (DHT) systems provide a solution to organizing and distributing the computations and storage without involving a trusted third party. However, the physical locations of nodes are not considered during the creation of the overlay which causes some efficiency issues. In this paper, Locality aware Distributed Addressing (LADA) model is proposed that can be adopted in distributed infrastructures to create an overlay that considers the physical locations of participating nodes. LADA aims to address the efficiency issues during the store and lookup processes in DHT overlay. Additionally, it addresses the privacy issue in similar proposals and removes any possible set of fixed entities. Our studies showed that the proposed model is efficient, robust and is able to protect the privacy of the locations of the participating nodes.
... The DHT offers several management services, which can be beneficial for multiple purposes, such as node bootstrap, node lookup and for the management of the communities. In our scenario, we used the Pastry DHT, because it has good performance and high reliability according to [31]. It is important to point out that the DHT is merely used as a support, while it is not used for any kind of application-level storage as, for instance, in [9,19]. ...
... As presented in Sect. 3, in the reference architecture we consider the presence of a DHT layer implemented by Pastry [31]. Several DOSNs proposals use the DHT as the underlying layer, since it allows to retrieve information about the nodes in the network, and it can help to support data availability [18,20,35,36] and privacy. ...
Article
Full-text available
Many decentralised systems can be represented as graphs, and the detection of their community structure can uncover important properties. Several community detection algorithms have been proposed, however, only a few solutions are suitable for detecting and managing communities in a distributed and highly dynamic environment. This lacking is mainly due to the difficulty of defining self-organising solutions in the presence of a high rate of dynamism. The main contribution of this paper is DISCO, a distributed protocol for community detection and management in a Peer-to-Peer dynamic environment. Our approach is mainly targeted to Decentralised Online Social Networks (DOSNs), but it can be applied in other distributed scenarios. In the context of DOSNs, DISCO allows to discover communities in the local social network of a user, named ego network, and to manage their evolution over time. DISCO is based on a Temporal Trade-off approach and exploits a set of super-peers for the management of the communities. The paper presents an extensive evaluation of the proposed approach based on a dataset gathered from Facebook and shows the ability of DISCO to orchestrate a set of nodes to detect and manage communities in a highly dynamic and decentralised environment. The paper also proposes a comparison with a state of the art approach, showing that it is capable of reducing the number of critical community lifecycle events by over 25%, and reducing the average loading factor by up to 50%. Graphical abstract
... Chord [1] was first introduced in 2001 and is still in use. While Chord has a ring topology, other DHTs [2,3,11,[17][18][19][20][21] can be implemented in a variety of network topologies, including trees (Tapestry [21]), XOR (Kademlia [3]), butterfly (Viceroy [19]), and hybrid (Pastry [20]) topologies. Some DHTs focus on minimizing the number of hops and thereby latency [2,6,[22][23][24]. ...
... Chord [1] was first introduced in 2001 and is still in use. While Chord has a ring topology, other DHTs [2,3,11,[17][18][19][20][21] can be implemented in a variety of network topologies, including trees (Tapestry [21]), XOR (Kademlia [3]), butterfly (Viceroy [19]), and hybrid (Pastry [20]) topologies. Some DHTs focus on minimizing the number of hops and thereby latency [2,6,[22][23][24]. ...
Conference Paper
Full-text available
... Thus, the existing load balancing techniques that could be used are DHT-based techniques [9,11,12,33], but also the VRRP protocol [34], which has an active copy and a backup copy of the running service, or components of the DNS (Domain Name System), such as the DNS NAPTR records [35] combined with the DNS SRV records [36]. These techniques are based on the existence of more physical machines behind a DNS name. ...
... The objective of this work had been from the very beginning to explore and extend the capabilities of the Distributed Hash Tables (DHTs), since the authors have felt that there can be more to them than just storage. The peer-to-peer systems and especially the DHTs, such as [9,11,12,33] have been introduced for quite some time, and have been the starting point for data sharing in distributed filesystems [1,19], but also for file sharing in home networks [43,44] or even in early IP telephony solutions [45]. Very recently, the DHTs have come into focus again, in storage and replication [6], in name resolution and resource lookup [2,3,13], but also in routing [15], and even in blockchain [2,3] design. ...
Article
Full-text available
This work aims to identify techniques leading to a highly available request processing service by using the natural decentralization and the dispersion power of the hash function involved in a Distributed Hash Table (DHT). High availability is present mainly in systems that: scale well, are balanced and are fault tolerant. These are essential features of the Distributed Hash Tables (DHTs), which have been used mainly for storage purposes. The novelty of this paper’s approach is essentially based on hash functions and decentralized Distributed Hash Tables (DHTs), which lead to highly available data solutions, which a main building block to obtain an improved platform that offers high availability for processing clients’ requests. It is achieved by using a database constructed also on a DHT, which gives high availability to its data. Further, the model requires no changes in the interface, that the request processing service already has towards its clients. Subsequently, the DHT layer is added, for the service to run on top of it, and also a load balancing front end, in order to make it highly available, towards its clients. The paper shows, via experimental validation, the good qualities of the new request processing service, by arguing its improved scalability, load balancing and fault tolerance model.
... 3. Connected Peers: Every peer keeps its neighborhood set similar to [125,126]. ...
... The relay points aim to avoid stalled playbacks and independent of the underlying P2P network. Therefore, PROMISE can be deployed using Pastry [125], Chord [130], and CAN [131]. MioStream depends on WebSockets and WebRTC architecture; the supervisor exposes the object lookup. ...
Thesis
Graph-structured data is pervasive. Modeling large-scale network-structured datasets require graph processing and management systems such as graph databases. Further, the analysis of graph-structured data often necessitates bulk downloads/uploads from/to the cloud or edge nodes. Unfortunately, experience has shown that malicious actors can compromise the confidentiality of highly-sensitive data stored in the cloud or shared nodes, even in an encrypted form. For particular use cases —multi-modal knowledge graphs, electronic health records, finance— network-structured datasets can be highly sensitive and require auditability, authentication, integrity protection, and privacy-preserving computation in a controlled and trusted environment, i.e., the traditional cloud computation is not suitable for these use cases. Similarly, many modern applications utilize a "shared, replicated database" approach to provide accountability and traceability. Those applications often suffer from significant privacy issues because every node in the network can access a copy of relevant contract code and data to guarantee the integrity of transactions and reach consensus, even in the presence of malicious actors. This dissertation proposes breaking from the traditional cloud computation model, and instead ship certified pre-approved trusted code closer to the data to protect graph-structured data confidentiality. Further, our technique runs in a controlled environment in a trusted data owner node and provides proof of correct code execution. This computation can be audited in the future and provides the building block to automate a variety of real use cases that require preserving data ownership. This project utilizes trusted execution environments (TEEs) but does not rely solely on TEE's architecture to provide privacy for data and code. We thoughtfully examine the drawbacks of using trusted execution environments in cloud environments. Similarly, we analyze the privacy challenges exposed by the use of blockchain technologies to provide accountability and traceability. First, we propose AGAPECert, an Auditable, Generalized, Automated, Privacy-Enabling, Certification framework capable of performing auditable computation on private graph-structured data and reporting real-time aggregate certification status without disclosing underlying private graph-structured data. AGAPECert utilizes a novel mix of trusted execution environments, blockchain technologies, and a real-time graph-based API standard to provide automated, oblivious, and auditable certification. This dissertation includes the invention of two core concepts that provide accountability, data provenance, and automation for the certification process: Oblivious Smart Contracts and Private Automated Certifications. Second, we contribute an auditable and integrity-preserving graph processing model called AuditGraph.io. AuditGraph.io utilizes a unique block-based layout and a multi-modal knowledge graph, potentially improving access locality, encryption, and integrity of highly-sensitive graph-structured data. Third, we contribute a unique data store and compute engine that facilitates the analysis and presentation of graph-structured data, i.e., TruenoDB. TruenoDB offers better throughput than the state-of-the-art. Finally, this dissertation proposes integrity-preserving streaming frameworks at the edge of the network with a personalized graph-based object lookup.
... Pastry is a purely decentralized system [20] inspired by Chord's ring topology, but the identification space is circular and undirected, so searches can be done in either direction. The identifier of a node has a size of 128 bits and is obtained randomly through a hash function based on the IP address or by using a public key. ...
... Routing a message from node to node in Pastry[20] ...
... The paper introduced a novel XOR metric to calculate the distance between nodes in the key space and a node Id routing algorithm that enabled nodes to locate other nodes close to a given target key efficiently. The presented single routing algorithm was more optimal compared to other algorithms such as Pastry [4], Tapestry [5] and Plaxton [6] that all required secondary routing tables. Kademlia was outlined as easily optimised with a base other than 2 with no need for secondary routing tables. ...
Chapter
The continuously advancing digitization has provided answers to the bureaucratic problems faced by eGovernance services. This innovation led them to an era of automation, broadened the attack surface and made them a popular target for cyber attacks. eGovernance services utilize the internet, which is a location addressed system in which whoever controls its location controls not only the content itself but also the integrity and the access of that content. We propose GLASS, a decentralized solution that combines the InterPlanetary File System with Distributed Ledger Technology and Smart Contracts to secure eGovernance services. We also created a testbed environment where we measure the system’s performance.
... The paper introduced a novel XOR metric to calculate the distance between nodes in the key space and a node Id routing algorithm that enabled nodes to locate other nodes close to a given target key efficiently. The presented single routing algorithm was more optimal compared to other algorithms such as Pastry [4], Tapestry [5] and Plaxton [6] that all required secondary routing tables. Kademlia was outlined as easily optimised with a base other than 2 with no need for secondary routing tables. ...
Preprint
Full-text available
The continuously advancing digitization has provided answers to the bureaucratic problems faced by eGovernance services. This innovation led them to an era of automation it has broadened the attack surface and made them a popular target for cyber attacks. eGovernance services utilize internet, which is currently a location addressed system where whoever controls the location controls not only the content itself, but the integrity of that content, and the access to that content. We propose GLASS, a decentralised solution which combines the InterPlanetary File System (IPFS) with Distributed Ledger technology and Smart Contracts to secure EGovernance services. We also create a testbed environment where we measure the IPFS performance.
... Third generation: The recently proposed P2P network aims to provide high flexibility assuming that the node will collapse with some possibility of failure (Fiat and Saia 2002). The structure means that the topology of the P2P network is tightly controlled (such as Mesh (Zhao, Kubiatowicz, and Joseph 2001) (Rowstron and Druschel 2001), Ring (Stoica et al. 2001), d-dimension Torus (Ratnasamy et al. 2001)). ...
Article
Full-text available
Distributed memory is a term used in computer science to describe a multiprocessor computer system in which each processor has its own private memory. Computational jobs can only work with local data, so if you need remote data, you'll have to communicate with one or more remote processors. Parallel and distributed computing are frequently used together. Distributed parallel computing employs many computing devices to process tasks in parallel, whereas parallel computing on a single computer uses multiple processors to execute tasks in parallel. Distributed systems are designed separately from the core network. There are different kinds of distributed systems such as peer-to-peer (P2P) networks, groups, grids, distributed storage systems. The multicore processor can be classified into two types: homogeneous and heterogeneous. This paper reviews the impact of the distributed-memory parallel processing approach on performance-enhancing of multicomputer-multicore systems. Also, number of methods have been introduced which used in distributed-memory systems and discuss which method is the best and enhance multicore performance in distributed systems. The best methods were those which used an operating system named gun/Linux 4.8.0-36, intel Xeon 2.5, python programming language.
... is a large-scale peer-to-peer persistent storage utility using Pastry [84]. In PAST, replicas of a file are stored on the nodes closest to the file ID. ...
Thesis
Access to the Web of Data is nowadays of real interest for research, mainly in the sense that the clients consuming or processing this data are more and more numerous and have various specificities (mobility, Internet, autonomy, storage, etc.). Tools such as Web applications, search engines, e-learning platforms, etc., exploit the Web of Data to offer end-users services that contribute to the improvement of daily activities. In this context, we are working on Web of Data access, considering constraints such as customer mobility and intermittent availability of the Internet connection. We are interested in mobility as our work is oriented towards end-users with mobile devices such as smartphones, tablets, laptops, etc. The intermittency of Internet connection refers herein to scenarios of unavailability of national or international links that make remote data sources inaccessible. We target a scenario where users form a peer-to-peer network such that anyone can generate information and make it available to everyone else on the network. Thus, we survey and compare several solutions (models, architectures,etc.) dedicated to Web of Data access by mobile contributors and discussed in relation to the underlying network architectures and data models considered. We present a conceptual study of peer-to-peer solutions based on gossip protocols dedicated to design the connected overlay networks and present a detailed analysis of data replication systems whose general objective is to ensure a system’s local data availability. On the basis of this work, we proposed an architecture adapted to constraining environments and allowing mobile contributors to share locally, via a browser network, an RDF dataset. The architecture consists of 3 levels: single peers, super peers and remote sources. Two main axes are considered for the implementation of this architecture: firstly the construction and maintenance of connectivity ensured by the gossip protocol, and secondly the high availability of data ensured by a replication mechanism. Our approach has the particularity to consider the location of each participant’s neighbours to increase the search perimeter and to integrate super-peers on which the data graph is replicated allowing data availability improvement. We finally carried out an experimental evaluation of our architecture through extensive simulation configured to capture key aspects of our motivating scenario of supporting data exchange between the participants of a local event.
... DHTs have been widely studied because of their attractive properties: efficiency and simplicity with Chord [12], controlled data placement with SkipNet [13], Pastry [14] routing and localization, and better consistency and reliable performance with Kademlia [15]. ...
Article
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
... With the grid, cloud and stand-alone systems, a jungle is created. Configurability and dependability are provided by various P2P grid systems, such as Gnutella [12], Freenet [13], and Pastry [14]. Unstructured and structured P2P networks can be found. ...
... Nevertheless, having a small routing table leads to an increase in the number of hops and thus reducing the overall data access latency performance [19,20]. Several DHT implementations have been proposed to reduce the number of hops such as Kademlia [21] or Pastry [22]. These solutions increase the number of neighbors in order to reduce the number of hops. ...
Preprint
Full-text available
Locating data efficiently is a key process in every distributed data storage solution and particularly those deployed in multi-site environments, such as found in Cloud and Fog computing. Nevertheless, the existing protocols dedicated to this task are not compatible with the requirements of the infrastructures that underlie such computing paradigms. In this paper, we initially review three fundamental mechanisms from which the existing protocols are used to locate data. We will demonstrate that these mechanisms all face the same set of limitations and seem to have a trade-off in three distinct domains of interest, namely, i) the scalability, ii) the ability to deal with the network topology changes and iii) the constraints on the data naming process. After laying out our motivation and identifying the related trade-offs in existing systems, we finally propose a conjecture (and provide a proof for this conjecture) stating that these three properties cannot be met simultaneously, which we believe is a new fundamental trade-off the distributed storage systems using the three fundamental mechanisms have to face. We conclude by discussing some of the implications of this novel result.
... This is practical in a tightly coupled cluster but routing requests to a central server negates any QoS improvements in a geo-distributed fog deployment. Pastry [4], OceanStore [5], or Cassandra [6] use hashing, which scales well and is easily implemented, but cannot take data movement based on proximity into account. ...
Preprint
Mobile clients that consume and produce data are abundant in fog environments. Low latency access to this data can only be achieved by storing it in close physical proximity to the clients. Current data store systems fall short as they do not replicate data based on client movement. We propose an approach to predictive replica placement that autonomously and proactively replicates data close to likely client locations.
... With an acceptable error probability, randomization can result in a significant reduction in time and message complexities. This is highly advantageous for large scale distributed systems (e.g., P2P systems, overlay and sensor networks [42,43,47]), where scalability is an important issue. Furthermore, in anonymous networks, a randomized solution is often possible by randomly assigning unique identifiers to nodes (as done herein), whereas a corresponding deterministic solution is impossible (see [2]). ...
Article
Full-text available
In this paper, we look at the problem of randomized leader election in synchronous distributed networks with a special focus on the message complexity. We provide an algorithm that solves the implicit version of leader election (where non-leader nodes need not be aware of the identity of the leader) in any general network with $$O(\sqrt{n} \log ^{7/2} n \cdot t_{mix})$$ messages and in $$O(t_{mix}\log ^2 n)$$ time, where n is the number of nodes and $$t_{mix}$$ refers to the mixing time of a random walk in the network graph G. For several classes of well-connected networks (that have a large conductance or alternatively small mixing times e.g., expanders, hypercubes, etc), the above result implies extremely efficient (sublinear running time and messages) leader election algorithms. Correspondingly, we show that any substantial improvement is not possible over our algorithm, by presenting an almost matching lower bound for randomized leader election. We show that $$\varOmega (\sqrt{n}/\phi ^{3/4})$$ messages are needed for any leader election algorithm that succeeds with probability at least $$1-o(1)$$, where $$\phi$$ refers to the conductance of a graph. To the best of our knowledge, this is the first work that shows a dependence between the time and message complexity to solve leader election and the connectivity of the graph G, which is often characterized by the graph’s conductance $$\phi$$. Apart from the $$\varOmega (m)$$ bound in Kutten et al. (J ACM 62(1):7:1–7:27, 2015) (where m denotes the number of edges of the graph), this work also provides one of the first non-trivial lower bounds for leader election in general networks.
... While constructing and maintaining the correct topology adds additional work for the algorithm designer, common operations such as routing and searching are much more efficient with these structured networks. There are many examples of such overlays, including Chord [19], Pastry [17], and Tapestry [21]. These early examples of structured networks, however, provided very limited fault tolerance. ...
Preprint
Overlay networks, where nodes communicate with neighbors over logical links consisting of zero or more physical links, have become an important part of modern networking. From data centers to IoT devices, overlay networks are used to organize a diverse set of processes for efficient operations like searching and routing. Many of these overlay networks operate in fragile environments where faults that perturb the logical network topology are commonplace. Self-stabilizing overlay networks offer one approach for managing these faults, promising to build or restore a particular topology from any weakly-connected initial configuration. Designing efficient self-stabilizing algorithms for many topologies, however, is not an easy task. For non-trivial topologies that have desirable properties like low diameter and robust routing in the face of node or link failures, self-stabilizing algorithms to date have had at least linear running time or space requirements. In this work, we address this issue by presenting an algorithm for building a Chord network that has polylogarithmic time and space complexity. Furthermore, we discuss how the technique we use for building this Chord network can be generalized into a design pattern'' for other desirable overlay network topologies.
... Although context oriented solution [11] provides more accurate lookup result, this accuracy comes at the cost of considerable overhead and delay comparing to DHT based lookup system. There are several protocols to implement DHT such as Chord [28], Kademila [17], Pastry [29] and Tapestry [30]. The lookup result accuracy in DHT lookup can be improved by different ranking algorithms, such as [31]. ...
Article
Full-text available
While the number of devices connected together as the Internet of Things (IoT) is growing, the demand for an efficient and secure model of resource discovery in IoT is increasing. An efficient resource discovery model distributes the registration and discovery workload among many nodes and allow the resources to be discovered based on their attributes. In most cases this discovery ability should be restricted to a number of clients based on their attributes, otherwise, any client in the system can discover any registered resource. In a binary discovery policy, any client with the shared secret key can discover and decrypt the address data of a registered resource regardless of the attributes of the client. In this paper we propose Attred, a decentralized resource discovery model using the Region-based Distributed Hash Table (RDHT) that allows secure and location-aware discovery of the resources in IoT network. Using Attribute Based Encryption (ABE) and based on predefined discovery policies by the resources, Attred allows clients only by their inherent attributes, to discover the resources in the network. Attred distributes the workload of key generations and resource registration and reduces the risk of central authority management. In addition, some of the heavy computations in our proposed model can be securely distributed using secret sharing that allows a more efficient resource registration, without affecting the required security properties. The performance analysis results showed that the distributed computation can significantly reduce the computation cost while maintaining the functionality. The performance and security analysis results also showed that our model can efficiently provide the required security properties of discovery correctness, soundness, resource privacy and client privacy.
... These systems provide a mapping between the data identifier and location, so that queries can be efficiently routed to the node(s) with the desired data, e.g. Chord [20], Pastry [21], Tapestry [22], and Kademlia [6] that inspires the IPFS overlay. ...
Preprint
The InterPlanetary File System (IPFS) is an hyper-media distribution protocol, addressed by content and identities. It aims to make the web faster, safer, and more open. The JavaScript implementation of IPFS runs on the browser, benefiting from the mass adoption potential that it yields. Startrail takes advantage of the IPFS ecosystem and strives to further evolve it, making it more scalable and performant through the implementation of an adaptive network caching mechanism. Our solution aims to add resilience to IPFS and improve its overall scalability, by avoiding overloading the nodes providing highly popular content, particularly during flash-crowd-like conditions where popularity and demand grow suddenly. We add a novel crucial key component to enable an IPFS-based decentralized Content Distribution Network (CDN). Following a peer-to-peer architecture, it runs on a scalable, highly available network of untrusted nodes that distribute immutable and authenticated objects which are cached progressively towards the sources of requests.
... CH is also used for information retrieval (Grossman and Frieder 2004), distributed databases (Ozsu and Valduriez 2011;Carlson 2013;Nishtala et al. 2013), and cloud systems (Karger et al. 1999;Nasri and Sharifi 2009;Wang and Loguinov 2007). Furthermore, CH resolves similar load-balancing issues that arise in peer-to-peer systems (Rowstron and Druschel 2001;Castro et al. 2002), and content-addressable networks (Ratnasamy et al. 2001). ...
Article
Dynamic load balancing lies at the heart of distributed caching. Here, the goal is to assign objects (load) to servers (computing nodes) in a way that provides load balancing while at the same time dynamically adjusts to the addition or removal of servers. Load balancing is a critical topic in many areas including cloud systems, distributed databases, and distributed and data-parallel machine learning. A popular and widely adopted solution to dynamic load balancing is the two-decade-old Consistent Hashing (CH). Recently, an elegant extension was provided to account for server bounds. In this paper, we identify that existing methodologies for CH and its variants suffer from cascaded overflow, leading to poor load balancing. This cascading effect leads to decreasing performance of the hashing procedure with increasing load. To overcome the cascading effect, we propose a simple solution to CH based on recent advances in fast minwise hashing. We show, both theoretically and empirically, that our proposed solution is significantly superior for load balancing and is optimal in many senses. On the AOL search dataset and Indiana University Clicks dataset with real user activity, our proposed solution reduces cache misses by several magnitudes.
... DHTs have been widely studied because of their attractive properties: efficiency and simplicity with Chord [2], controlled data placement with SkipNet [3], Pastry [4] routing and localization, and good consistency and reliable performance with Kademlia [5]. ...
... Many peer-to-peer (P2P) systems [30][31][32][33][34] adopt the idea of key-value pairs. Although data in P2P systems are stored in key-value pair format, the goal of P2P is to offer file sharing, which is different from our paper. ...
Article
Full-text available
NoSQL databases are flexible and efficient for many data intensive applications, and the key-value store is one of them. In recent years, a new Ethernet accessed disk drive called the “Kinetic Drive” was developed by Seagate. This new Kinetic Drive is specially designed for key-value stores. Users can directly access data with a Kinetic Drive via its IP address without going through a storage server/layer. With this new innovation, the storage stack and architectures of key-value store systems have been greatly changed. In this paper, we propose a novel global key-value store system based on Kinetic Drives. We explore data management issues including data access, key indexing, data backup, and recovery. We offer scalable solutions with small storage overhead. The performance evaluation shows that our location-aware design and backup approach can reduce the average distance traveled for data access requests.
... In conclusion, a node keeps ties to its successors from the finger table, with the first finger being the node's direct successor, and is also linked with the it's immediate predecessor on the circle. Keeping a successor list is similar to that of the Pastry leaf set (to route in "slow" mode when there are no other choices) [40]. An example of the circular structure of the hash space of the Chord Protocol with m = 7 bits, is shown on the figure 2. In this example some nodes are already joined in the network and the finger table is displayed for one of these nodes. ...
Conference Paper
Full-text available
On the modern era of Internet of Things (IoT) and Industry 4.0 there is a growing need for reliable wireless long range communications. LoRa is an emerging technology for effective long range communications which can be directly applied to IoT applications. Wireless sensor networks (WSNs) are by far an efficient infrastructure where sensors act as nodes and exchange information among themselves. Distributed applications such as Peer-to-Peer (P2P) networks are inextricably linked with Distributed Hash Tables (DHTs) whereabouts DHTs offer effective and speedy data indexing. A Distributed Hash Table structure known as the Chord algorithm enables the lookup operation of nodes which is a major algorithmic function of P2P networks. In the context of this paper, the inner workings of Chord protocol are highlighted along with an introduced modified version of it for WSNs. Additionally, we adapt the proposed method on LoRa networks where sensors function as nodes. The outcomes of the proposed method are encouraging as per complexity and usability and future directions of this work include the deployment of the proposed method in a large scale environment, security enhancements and distributed join, leave and lookup operations.
... Bir ağ, ne kadar dağıtıksa genel olarak çeşitli saldırı ve bozulma biçimlerine karşı o kadar dirençlidir ve bir bütün olarak ağ herhangi bir tek düğüme o kadar az bağımlıdır (Rowstron ve Druschel, 2001). Bu, aynı zamanda ağın belirli bir düğümü işletecek herhangi bir kişiye, şirkete veya kuruluşa daha az bağımlı olabileceğini de göstermektedir. ...
Chapter
Full-text available
Metaverse farklı teknolojilerle birçok sanal dünyanın oluşturulması ile üç boyutlu erişim sağlayan bir yapı olarak gelişmektedir. Metaverse oluşumunda önce internet büyük bir değişim içerisine girmiştir. Bu değişimin temelinde merkezi sistemlerden merkeziyetsiz sistemlere geçiş en önemli rolü oynamıştır. Disiplinlerarası merkeziyetsizliğin içinde itici güç olarak bilgi güvenliği, güç, politika ve ekonomi yer almaktadır. Merkeziyetsizlik; blokzinciri teknolojisiyle gerçekleştirilebilir olduktan sonra daha fazla önem kazanmıştır. Blokzinciri birçok alanda merkeziyetsizlik temelli işlemler yapılabilmesini sağlamaktadır. Bu sistemlerin ekonomik boyutu üst düzeyde olup, kripto para birimleri de bu çerçevede kullanıma başlanmıştır. Bu gelişmeler için merkeziyetsiz internet en önemli gerekliliktir. Bu süreçte; sanal odalar ve üç boyutlu avatarlar ile internet siteleriyle daha gelişmiş etkileşim sağlayan yeni yapılar geliştirilmektedir. Metaverse konusunda sürekli yeni deneme ve uygulamalar gerçekleştirilmektedir. Bilgi teknoloji çağında her deneme ve uygulama en kısa zamanda kullanıcılar ve geliştiricilerin dikkatine sunulmaktadır. Bu sürecin bu kadar hızlı gelişmesinin arkasında merkeziyetsizlik bulunmaktadır. Aracıların aradan kaldırılmasıyla eşten eşe işlemler mümkün olmakta, kullanıcılar dağıtık ve merkezsiz ağ teknolojisini kullanarak daha önceden yapılamayanı gerçekleştirmektedirler. Bu da çok yüksek sayıda sistem kullanıcısına çok daha kısa sürelerde ulaşılmasını sağlamaktadır. Bu sayede metaverse pazarının trilyon Dolarları aşan bir potansiyelinin oluştuğu birçok ekonomist tarafından dile getirilmektedir. Metaverse ortamına bağlanan kişi, bu ortama aktardığı kişisel verilerini ve bu ortamdan kazandığı değerlerini korumak istemektedir. Merkeziyetsiz sistemler bu anlamda hem kullanıcıya hem de bu sistem üzerinden servis sağlayanlara güven sağlamaktadır. Web 3.0, akıllı sözleşmeler, kripto varlıklar ve jeton (token) ekonomisi ile bilgi paylaşma süreçlerini daha üst noktalara taşıyacaktır. Bu bölümde dağıtık sistemler ve merkeziyetsizlik kavramı disiplinlerarası bakış açısından incelenmiştir. Merkeziyetsiz sistemlerin taşıdığı fırsatlar ve olası sorunlar değerlendirilerek merkeziyetsiz metaverse kavramı açıklanmıştır.
... Infonnation Disselnination Information dissemination focuses on information management in an ad hoc network environment. The motivating applications are described in Chapter 5 and in the information discovery and management techniques in Peer-to-Peer environments (Stoica, Morris, Liben-Nowell, Karger, Kaashoek, Dabek, and Balakrishnan 2003;Zhao, Kubiatmvicz, and Joseph 2001;Rowstron and Druschel 2001). The approach for this thesis uses the characteristics of the information and its demand and supply characteristics to create an overlay. ...
Thesis
p>An emergent trend in large scale distributed systems enables collaboration between large numbers of independent resource providers. Grid computing and peer-to-peer computing are part of this trend. Resource management in such systems is inherently different from that found in traditional distributed systems, the key difference being that the new classes of systems are primarily designed to operate under inconsistent system information and temporally varying operating environments. Although primarily used to enable collaboration of computational resources, these systems have also found application in the field of distributed data management. Although the principles of grid computing and peer-to-peer computing have found many applications, little effort has been made to abstract the common requirements, in order to provide a conceptual resource framework. This thesis investigates the alleviation of such common requirements through investigations in the field of online scheduling, information dissemination in peer-to-peer networks, and query processing in distributed stream processing systems. A survey of system types is provided to highlight the new trends observed. A top down approach to developing a unifying model seems inapplicable and the range of problems encountered in these system types can only be addressed by identifying common trends and addressing them individually. Consequently, three application domains have been identified in the respective fields of online scheduling, data dissemination and stream query processing. Each of these application class is investigated individually. For each application domain, a review of the state-of-the-art is followed by a precise definition of the problem addressed in the application domain and the solutions developed are substantiated with experimental evaluation. Findings from individual applications have been summarized to generalize the observations towards an overall hypothesis.</p
... However, they fail when a search using partial matching is required and create additional overhead due to the management of the network architecture. There are two types of structured overlays [1]: Distributed Hash Table (DHT) based systems (eg, Chord [6], CAN [7], Pastry [8], Tapestry [9], P-Grid [10], D-Grid [11]) and non-DHT based systems (eg, Mercury [12]). DHT provides a lookup service similar to a hash table: key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. ...
Article
Full-text available
Advances in high-performance and distributed computing, and the development of manycore technologies have led to the widespread usage of Peer-to-Peer (P2P), Clouds, and Grid computing systems comprised of heterogeneous machines. Though each of these computing systems has its own distinct environments, they have a common key property; the ability to share resources/services among all the entities distributed across the system. Therefore, using an appropriate strategy for discovering necessary resources in low overhead and time is extremely essential for achieving transparent communication in distributed systems. In this paper summary, we summarize the implementation of resource discovery strategies in different distributed system architectures, and explain the algorithms vital for designing resource discovery protocols in peer-to-peer decentralized architectures.
... P2P Networks. There have been countless P2P overlay architecture proposals, including dozens of DHT structures [30,32,54,59,72], and tens of applications, e.g., large-scale content delivery platforms [15], and services such as decentralized social networks [25]. Rather than devising an entirely new system, IPFS utilizes the Kademlia DHT for content indexing [44]. ...
Preprint
Full-text available
Recent years have witnessed growing consolidation of web operations. For example, the majority of web traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response to this, the "Decentralized Web" attempts to distribute ownership and operation of web services more evenly. This paper describes the design and implementation of the largest and most widely used Decentralized Web platform - the InterPlanetary File System (IPFS) - an open-source, content-addressable peer-to-peer network that provides distributed data storage and delivery. IPFS has millions of daily content retrievals and already underpins dozens of third-party applications. This paper evaluates the performance of IPFS by introducing a set of measurement methodologies that allow us to uncover the characteristics of peers in the IPFS network. We reveal presence in more than 2700 Autonomous Systems and 152 countries, the majority of which operate outside large central cloud providers like Amazon or Azure. We further evaluate IPFS performance, showing that both publication and retrieval delays are acceptable for a wide range of use cases. Finally, we share our datasets, experiences and lessons learned.
... Uma rede P2P com arquitetura estruturada [6,7,8] foi escolhida por realizar de forma rápida a busca e a inserção dos valores de reputação calculados, já que requer um menor número de mensagens para a obtenção da reputação de uma estação. Entretanto, as redes estruturadas possuem a desvantagem de utilizar uma topologia definida, geralmente em Anel, de forma que o custo de manutenção dos peers na rede torna-se elevado com o aumento de tamanho da mesma. ...
Conference Paper
No ambiente da Web atual, a possibilidade de interação direta entre usuários, sem a presença de autoridades centrais para intermediar o acesso a um serviço, faz surgir à necessidade de sistemas eficientes que garantam a segurança das transações. Sistemas de Reputação, bastante usados em redes P2P, têm como objetivo a avaliação individual de peers, baseando-se nas suas interações prévias. Tais sistemas podem ser utilizados para garantir a segurança em nível de serviços. Este artigo propõe o uso de um Sistema de Reputação orientado a serviços e baseado em lógica nebulosa, como forma de aumentar o nível de segurança dos serviços trocados entre estações. O artigo apresenta a descrição do sistema proposto, o cenário de aplicação e as simulações realizadas para sua avaliação.
Article
Object-based storage systems have been widely used for various scenarios such as file storage, block storage, blob (e.g., large videos) storage, etc., where the data is placed among a large number of object storage devices (OSDs). Data placement is critical for the scalability of decentralized object-based storage systems. The state-of-the-art CRUSH placement method is a decentralized algorithm that deterministically places object replicas onto storage devices without relying on a central directory. While enjoying the benefits of decentralization such as high scalability, robustness, and performance, CRUSH-based storage systems suffer from uncontrolled data migration when expanding the capacity of the storage clusters (i.e., adding new OSDs), which is determined by the nature of CRUSH and will cause significant performance degradation when the expansion is nontrivial. This paper presents MapX , a novel extension to CRUSH that uses an extra time-dimension mapping (from object creation times to cluster expansion times) for controlling data migration after cluster expansions. Each expansion is viewed as a new layer of the CRUSH map represented by a virtual node beneath the CRUSH root. MapX controls the mapping from objects onto layers by manipulating the timestamps of the intermediate placement groups (PGs). MapX is applicable to a large variety of object-based storage scenarios where object timestamps can be maintained as higher-level metadata. We have applied MapX to the state-of-the-art Ceph-RBD (RADOS Block Device) to implement a migration-controllable, decentralized object-based block store (called Oasis ). Oasis extends the RBD metadata structure to maintain and retrieve approximate object creation times (for migration control) at the granularity of expansion layers. Experimental results show that the MapX -based Oasis block store outperforms the CRUSH-based Ceph-RBD (which is busy in migrating objects after expansions) by 3.17 × ∼4.31 × in tail latency, and $$76.3\%$$ (resp. $$83.8\%$$ ) in IOPS for reads (resp. writes).
Article
Full-text available
Over the last two decades, peer-to-peer systems have proven their vital role in sharing various resources and services to diverse user communities over the internet. The unstructured P2P network is the most popular topology, and the resources are fully distributed among participating peers. Therefore, searching is a challenging issue due to the absence of control over resource locations. Intelligent decisions should be made to select a particular number of neighbors that can hold relevant resources for queries instead of selecting neighbors randomly. In this paper, an intelligent neighbor selection (INS) algorithm is introduced that uses a reinforcement learning approach, ‘Q-learning’. The main objective of this algorithm is to achieve better retrieval effectiveness with reduced searching costs by fewer connected peers, exchanged messages, and less time. To achieve this, INS relies on Q-learning, which makes a Q-table in each peer and stores the Q-values gathered from the results of previously sent queries, and uses them for the forthcoming queries. The cold start issue during the training phase is also addressed in this research, which allows INS to improve its results continuously. The simulation results show a significant improvement in searching for a resource with compression to controlled flooding and learning processes after sufficient training. Here, retrieval effectiveness, search cost in terms of connected peers, and average overhead are 1.23, 104, 167, respectively.
Article
The advent of virtualization and cloud computing has fundamentally changed how distributed applications and services are deployed and managed. With the proliferation of IoT and mobile devices, virtualized systems akin to those offered by cloud providers are increasingly needed geographically near the network’s edge to perform processing tasks in proximity to the data sources and sinks. Latency-sensitive, bandwidth-intensive applications can be decomposed into workflows that leverage resources at the edge – a model referred to as fog computing. Not only is performance important, but a trustworthy network is fundamental to guaranteeing privacy and integrity at the network layer. This paper describes Bounded Flood, a novel technique that enables virtual private Ethernet networks that span edge and cloud resources – including those constrained by NAT and firewall middleboxes. Bounded Flood builds upon a scalable structured peer-to-peer overlay, and is novel in how it integrates overlay tunnels with SDN software switches to create a virtual network with dynamic membership – supporting unmodified Ethernet/IP stacks to facilitate the deployment of edge applications. Bounded Flood has been implemented as the core of the EdgeVPN open-source virtual private network software system for edge computing. Experiments with the software demonstrate its functionality and scalability – one of which includes Kubernetes with Flannel across Raspberry Pi 4 edge devices behind different NATs.
Article
Peer-to-peer networks offer a solid foundation for wide-scale resource sharing, collaborative computing, and data distribution. Such networks have been commonly used for group communication by overlaying a publish/subscribe service atop their routing substrate. In this work, we focus on offering a group communication service that targets unstructured content such as images and videos for dissemination at internet scale. The decoupled nature of publish/subscribe systems exacerbated by the decentralized and large-scale nature of peer-to-peer networks brings about a semantic boundary between publishers and subscribers. More precisely, the large semantic space of human-level recognition creates a very large content space of object labels, attributes, and relationships. The scale of the content space makes it nearly impossible for participants to agree on a bounded set of terms for subscribers to express their exact interests. We identify an inherent limitation of peer-to-peer networks lying in the exact-match property of their key-based routing primitives. We propose an approximate matching model where participants agree on a distributional model of word meaning that maps terms to a vector space. We overcome the exact-match limitation by proposing a novel distributed lookup protocol and algorithm to construct a peer-to-peer network and route content. We replace conventional logical key spaces with a high-dimensional vector space that preserves the semantic properties of the data being mapped. Experiments show that the proposed model achieves more than 97% recall in routing accuracy, that is, locating a node responsible for storing a data item in a few routing hops. Furthermore, results also show that the network achieves over 90% recall in approximately matching two semantically related terms via rendezvous routing.
Chapter
Node discovery is a fundamental service for any overlay network. It is a particular challenge to provide unbiased discovery in untrustworthy environments, e.g., anonymization networks. Although a major line of research focused on solving this problem, proposed methods have been shown to be vulnerable either to active attacks or to leak routing information, both threatening the anonymity of users. In response, we propose GuardedGossip—a novel gossip-based node discovery protocol—that achieves an unbiased random node discovery in a fully-decentralized and highly-scalable fashion. It is built on top of a Chord distributed hash table (DHT) and relies on witness nodes and bound checks to resist active attacks. To limit routing information leakages, GuardedGossip uses gossiping to create uncertainty in the process of node discovery. By incorporating the principles of DHTs with the unstructured nature of gossiping in a subtle way, we profit from the strengths of both techniques while carefully mitigating their shortcomings. We show that GuardedGossip provides a sufficient level of security for users even if 20% of the participating nodes are malicious. Concurrently, our system scales gracefully and provides an adequate overhead for its security and privacy benefits.
Chapter
This article presents GRAPP&S (Grid APPlication & Services), a specification of a multi-scale architecture for the management (unified storage and indexing) of data and services near users. We manage all types of data and services through the use of specific node called proxy. GRAPP&S’s architecture consists of three types of nodes, each with different roles. These nodes are grouped together in the form of communities (local networks) using multi-scale principles. The data is presented transparently to the user through proxies (an example of GRAPP&S nodes) specific to each type of data. In addition, the GRAPP&S architecture has been designed to allow the interconnection of different communities and the application of security and privacy policies, both within a community and between different communities. Our framework adopts a routing mechanism prefixed for research and access to data GRAPP&S. This access does not depend on a direct connection between the nodes, as in most P2P or other networks. In GRAPP&S, it is always possible to route the data transfer path used when looking at cases where a direct connection between the nodes is not possible.
Article
Full-text available
Experts from all over the world provide an opportunity in filling decision-making systems. But the filling of decision-making systems with data does not have an exact quantitative characteristic. It is good when the expert is completely confident in the decision. But some decisions can add up to their own internal assessment without justification or experimentation. Other decisions are hampered by past experience. To overcome this type of problem, it is necessary to develop a system that will be based on clear and fuzzy data behaviour. This article is aimed at describing the method for constructing a decision-making system on clear and fuzzy data using Petri nets.
Article
The corporate environment currently focused on the inclusive workflow, involves bringing together employees from different countries to achieve a common organizational goal. Difficulties that arise in the process of achieving it may be due to a lack of sufficient experience or, on the contrary, past experience interferes with making a decision. To overcome such problems, it is important to develop a decision support system, a team deci-sion support tool and an information system for managers. The information security of the decision-making system is carried out by modeling the security elements of the system that analyze the integrity of information in the process of its operation. Information security modeling is based on information security methods based on ungrouping and creating subsystems with heterogeneous data. This method is provided by a graph and hypograph implementation, which can be converted to a matrix form. For the conceptual apparatus there are used the rules in the form “If A, then B” and their analogues. The model of functioning the decision-making system based on the multi-agent approach is described. The results obtained can be used in the creation of simulators and decision-making systems themselves in various subject areas. By separating data flows into computational and executive ones, data types are distinguished and better modeling of the entire system is carried out. The solution of the problem of maximum fulfillment of the goals set for the decision-making system can be implemented by modified network methods, such as Markov nets, Petri nets, etc.
Chapter
The resources in Internet of Things (IoT) are distributed among different physical geographic locations. In centralized resource discovery, the resources are registered in a centralized third-party server, and the clients can discover any resource by querying the centralized entity. In the decentralized resource discovery, the task of resource registration and discovery is distributed among many nodes in the system. Replacing a centralized entity with a distributed set of nodes requires that a system fulfils some security and performance requirements. In this paper, the centralized and decentralized resource discovery models are discussed. In addition, the properties of decentralized resource discovery are studied, and some of the fundamental and most important requirements for such models are discussed. Each of the fundamental requirements in decentralized resource discovery is analysed, and the possible approaches and their feasibility in IoT network are studied.KeywordsResource discoveryInternet of thingsP2PDecentralized architecture
Article
The Internet of Things (IoT) can be defined as an extensive network of interconnected devices that enables any physical object to be part of the worldwide network. The opportunity of everything being interconnected to the internet leads to many other challenges, such as many devices, the exponentially generated data, and the limited resources capacity of such devices (in terms of storage capacity, processing, energy, and accessibility). Decentralized systems and more particularly Peer-to-Peer (P2P) systems can meet the requirements of IoT applications. Peer-to-Peer DHT-Based approaches enable an effective search within a logarithmic cost. However, two issues need to be addressed to suit efficiently resource-constrained IoT systems. The first technical gap is that DHT-Based approaches do not handle network scalability and resources allocation, so they assume both nodes and keys must be in the same space, therefore, they are exclusively determined by the output of the hashing function. The second gap is that service discovery in such approaches is independent of the node’s resources capacity, which makes it ineffective and unsuitable for resource-constrained IoT systems. To cope with such limitations, our approach fundamentally addresses the previous technical gaps in existing DHT-based approaches by the following: Firstly, it deals with the first issue by adopting a novel mapping mechanism for both keys and identifiers based on geometric angles, the same mechanism is applied for nodes auto-configuration. Secondly, it introduces a customized routing mechanism that factors in nodes' processing capacity. We consider the number of hops, routing table size, configuration overhead, and load distribution as performance metrics. The proposed protocol's implementation and simulation with multiple experimentations confirm our approach's out-performance and effectiveness against competing IoT application approaches.
Conference Paper
Full-text available
This paper presents Scribe, a large-scale event notification infrastructure for topic-based publish-subscribe applications. Scribe supports large numbers of topics, with a potentially large number of subscribers per topic. Scribe is built on top of Pastry, a generic peer-to-peer object location and routing substrate overlayed on the Internet, and leverages Pastry’s reliability, self-organization and locality properties. Pastry is used to create a topic (group) and to build an efficient multicast tree for the dissemination of events to the topic’s subscribers (members). Scribe provides weak reliability guarantees, but we outline how an application can extend Scribe to provide stronger ones.
Conference Paper
Full-text available
We propose a new design for the Domain Name System (DNS) that takes advantage of recent advances in disk storage and multicast distribution technology. In essence, our design consists of geographically distributed servers, called replicated servers, each of which has a complete and up-to-date copy of the entire DNS database. To keep the replicated servers up-to-date, they distribute new resource records over a satellite channel or over terrestrial multicast. The design allows Web sites to dynamically wander and replicate themselves without having to change their URL. The design can also significantly improve the Web surfing experience since it significantly reduces the DNS lookup delay
Article
Full-text available
Efficiently determining the node that stores a data item in a distributed network is an important and challenging problem. This paper describes the motivation and design of the Chord system, a decentralized lookup service that stores key/value pairs for such networks. The Chord protocol takes as input an m-bit identifier (derived by hashing a higher-level application specific key), and returns the node that stores the value corresponding to that key. Each Chord node is identified by an m-bit identifier and each node stores the key identifiers in the system closest to the node's identifier. Each node maintains an m-entry routing table that allows it to look up keys efficiently. Results from theoretical analysis, simulations, and experiments show that Chord is incrementally scalable, with insertion and lookup costs scaling logarithmically with the number of Chord nodes.
Article
Full-text available
A name service maps a name of an individual, organization or facility into a set of labeled properties, each of which is a string. It is the basis for resource location, mail addressing, and authentication in a distributed computing system. The global name service described here is meant to do this for billions of names distributed throughout the world. It addresses the problems of high availability, large size, continuing evolution, fault isolation and lack of global trust. The non-deterministic behavior of the service is specified rather precisely to allow a wide range of client and server implementations.
Article
Full-text available
The Cooperative File System (CFS) is a new peer-to-peer readonly storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers provide a distributed hash table (DHash) for block storage. CFS clients interpret DHash blocks as a file system. DHash distributes and caches blocks at a fine granularity to achieve load balance, uses replication for robustness, and decreases latency with server selection. DHash finds blocks using the Chord location protocol, which operates in time logarithmic in the number of servers. CFS is implemented using the SFS file system toolkit and runs on Linux, OpenBSD, and FreeBSD. Experience on a globally deployed prototype shows that CFS delivers data to clients as fast as FTP. Controlled tests show that CFS is scalable: with 4,096 servers, looking up a block of data involves contacting only seven servers. The tests also demonstrate nearly perfect robustness and unimpaired performance even when as many as half the servers fail. 1.
Article
This document is an overview of the X.500 standard for people not familiar with the technology. It compares and contrasts Directory Services based on X.500 with several of the other Directory services currently in use in the Internet. This paper also describes the status of the standard and provides references for further information on X.500 implementations and technical information. A primary purpose of this paper is to illustrate the vast functionality of the X.500 protocol and to show how it can be used to provide a global directory for human use, and can support other applications which would benefit from directory services, such as main programs.
Article
Landmark Routing is a set of algorithms for routing in communications networks of arbitrary size. Landmark Routing is based on a new type of hierarchy, the Landmark Hierarchy. The Landmark Hierarchy exhibits path lengths and routing table sizes similar to those found in the traditional area or cluster hierarchy. The Landmark Hierarchy, however, is easier to dynamically configure using a distributed algorithm. It can therefore be used as the basis for algorithms that dynamically configure the hierarchy on the fly, thus allowing for very large, dynamic networks. This paper describes the Landmark Hierarchy, analyzes it, and compares it with the area hierarchy.
Article
Naming is an important aspect of distributed system design. A naming system allows users and programs to assign character-string names to objects, and subsequently use the names to refer to those objects. With the interconnection of clusters of computers by wide-area networks and internetworks, the domain over which naming systems must function is growing to encompass the entire world. In this paper we address the problem of a global naming system, proposing a three-level naming architecture that consists of global, administrational, and managerial naming mechanisms, each optimized to meet the performance, reliability, and security requirements at its own level. We focus in particular on a decentralized approach to the lower levels, in which naming is handled directly by the managers of the named objects. Client-name caching and multicast are exploited to implement name mapping with almost optimum performance and fault tolerance. We also show how the naming system can be made secure. Our conclusions are bolstered by experience with an implementation in the V distributed operating system.
Conference Paper
We consider an architecture for a serverless distributed file system that does not assume mutual trust among the client computers. The system provides security, availability, and reliability by distributing multiple encrypted replicas of each file among the client machines. To assess the feasibility of deploying this system on an existing desktop infrastructure, we measure and analyze a large set of client machines in a commercial environment. In particular, we measure and report results on disk usage and content; file activity; and machine uptimes, lifetimes, and loads. We conclude that the measured desktop infrastructure would passably support our proposed system, providing availability on the order of one unfilled file request per user per thousand days.
Conference Paper
Consider a set of shared objects in a distributed network, where several copies of each object may exist at any given time. To ensure both fast access to the objects as well as efficient utilization of network resources, it is desirable that each access request be satisfied by a copy close'' to the requesting node. Unfortunately, it is not clear how to achieve this goal efficiently in a dynamic, distributed environment in which large numbers of objects are continuously being created, replicated, and destroyed. In this paper we design a simple randomized algorithm for accessing shared objects that tends to satisfy each access request with a nearby copy. The algorithm is based on a novel mechanism to maintain and distribute information about object locations, and requires only a small amount of additional memory at each node. We analyze our access scheme for a class of cost functions that captures the hierarchical nature of wide-area networks. We show that under the particular cost model considered (i) the expected cost of an individual access is asymptotically optimal, and (ii) if objects are sufficiently large, the memory used for objects dominates the additional memory used by our algorithm with high probability. We also address dynamic changes in both the network and the set of object copies.
Conference Paper
Overcast is an application-level multicasting system that can be incrementally deployed using today's Internet infrastructure. These properties stem from Overcast's implementation as an overlay network. An overlay network consists of a collection of nodes placed at strategic locations in an existing network fabric. These nodes implement a network abstraction on top of the network provided by the underlying substrate network. Overcast provides scalable and reliable single-source multicast using a simple protocol for building efficient data distribution trees that adapt to changing network conditions. To support fast joins, Overcast implements a new protocol for efficiently tracking the global status of a changing distribution tree. Results based on simulations confirm that Overcast provides its added functionality while performing competitively with IP Multicast. Simulations indicate that Overcast quickly builds bandwidth-efficient distribution trees that, compared to IP Multicast, provide 70%-100% of the total bandwidth possible, at a cost of somewhat less than twice the network load. In addition, Overcast adapts quickly to changes caused by the addition of new nodes or the failure of existing nodes without causing undue load on the multicast source.
Article
Univers is a generic attribute-based name server upon which a variety of high-level naming services can be built. This paper defines Univers' underlying attribute-based naming model. It also describes several aspects of its implementation and demonstrates how various naming services—including a global white-pages service, a local yellow-pages service and a conventional name-to-address mapper—can be built on top of Univers.
Article
Hash tables -- which map "keys" onto "values" -- are an essential building block in modern software systems. We believe a similar functionality would be equally valuable to large distributed systems. In this paper, we introduce the concept of a ContentAddressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales. The CAN design is scalable, fault-tolerant and completely self-organizing, and we demonstrate its scalability, robustness and low-latency properties through simulation.
Article
. The explosion of the web has led to a situation where a majority of the traffic on the Internet is web related. Today, practically all of the popular web sites are served from single locations. This necessitates frequent long distance network transfers of data (potentially repeatedly) which results in a high response time for users, and is wasteful of the available network bandwidth. Moreover, it commonly creates a single point of failure between the web site and its Internet provider. This paper presents a new approach to web replication, where each of the replicas resides in a different part of the network, and the browser is automatically and transparently directed to the "best" server. Implementing this architecture for popular web sites will result in a better response-time and a higher availability of these sites. Equally important, this architecture will potentially cut down a significant fraction of the traffic on the Internet, freeing bandwidth for other uses. 1....
Article
We present a design for a system of anonymous storage which resists the attempts of powerful adversaries to find or destroy any stored data. We enumerate distinct notions of anonymity for each party in the system, and suggest a way to classify anonymous systems based on the kinds of anonymity provided. Our design ensures the availability of each document for a publisher-specified lifetime. A reputation system provides server accountability by limiting the damage caused from misbehaving servers. We identify attacks and defenses against anonymous storage services, and close with a list of problems which are currently unsolved.
Article
This paper describes PAST, a large-scale, Internet based, global storage utility that provides high availability, persistence and protects the anonymity of clients and storage providers. PAST is a peer-to-peer Internet application and is entirely self-organizing. PAST nodes serve as access points for clients, participate in the routing of client requests, and contribute storage to the system. Nodes are not trusted, they may join the system at any time and may silently leave the system without warning. Yet, the system is able to provide strong assurances, efficient storage access, load balancing and scalability.
Article
We describe Freenet, an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity of both authors and readers. Freenet operates as a network of identical nodes that collectively pool their storage space to store data files and cooperate to route requests to the most likely physical location of data. No broadcast search or centralized location index is employed. Files are referred to in a location-independent manner, and are dynamically replicated in locations near requestors and deleted from locations where there is no interest. It is infeasible to discover the true origin or destination of a le passing through the network, and difficult for a node operator to determine or be held responsible for the actual physical contents of her own node.
Article
. We have built an HTTP based resource discovery system called Discover that provides a single point of access to over 500 WAIS servers. Discover provides two key services: query refinement and query routing. Query refinement helps a user improve a query fragment to describe the user's interests more precisely. Once a query has been refined and describes a manageable result set, query routing automatically forwards the query to the WAIS servers that contain relevant documents. Abbreviated descriptions of WAIS sites called content labels are used by the query refinement and query routing algorithms. Our experimental results suggest that query refinement in conjunction with the query routing provides an effective way to discover resources in a large universe of documents. Our experience with query refinement has convinced us that the expansion of query fragments is essential for using a large, dynamically changing, heterogenous distributed information system. 1 Introduction Locating an...