Article

GFFS-the xsede global federated file system

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Federated, secure, standardized, scalable, and transparent mechanism to access and share resources, particularly data resources, across organizational boundaries that does not require application modification and does not disrupt existing data access patterns has been needed for some time in the computational science community. The Global Federated File System (GFFS) addresses this need and is a foundational component of the NSF-funded eXtreme Science and Engineering Discovery Environment (XSEDE) program. The GFFS allows user applications to access (create, read, update, delete) remote resources in a location-transparent fashion. Existing applications, whether they are statically linked binaries, dynamically linked binaries, or scripts (shell, PERL, Python), can access resources anywhere in the GFFS without modification (subject to access control). In this paper we present an overview of the GFFS and its most common use cases: accessing data at an NSF center from a home or campus, accessing data on a campus machine from an NSF center, directly sharing data with a collaborator at another institution, accessing remote computing resources, and interacting with remote running jobs. We present these uses cases and how they are realized using the GFFS.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The data analysis of this dataset has the goal to cluster special data points and identify any noise elements. In this paper we describe the architectural design and implementation necessary to run data analysis jobs on HPC resources through standards-based UNICORE middleware [10] and uses data from the Global Federated File System [28] that we derive from the architecture of the Extreme Science and Engineering Discovery Environment (XSEDE) [24]. UNICORE is a HPC middleware and deployed on production on XSEDE supercomputing sites, whereas the GFFS is a distributed network file system which is an integral component of the Genesis II platform, that is also consider to be a middleware element in XSEDE. ...
... System. The Genesis II Global Federated File System (GFFS) [28] is a distributed file system that provides researchers with tools for securely managing and sharing their scientific data. The GFFS offers a set of interfaces that manage jobs and provide access to the required scientific data. ...
Article
Full-text available
Emerging challenges for scientific communities are to efficiently process big data obtained by experimentation and computational simulations. Supercomputing architectures are available to support scalable and high performant processing environment, but many of the existing algorithm implementations are still unable to cope with its architectural complexity. One approach is to have innovative technologies that effectively use these resources and also deal with geographically dispersed large datasets. Those technologies should be accessible in a way that data scientists who are running data intensive computations do not have to deal with technical intricacies of the underling execution system. Our work primarily focuses on providing data scientists with transparent access to these resources in order to easily analyze data. Impact of our work is given by describing how we enabled access to multiple high performance computing resources through an open standards-based middleware that takes advantage of a unified data management provided by the the Global Federated File System. Our architectural design and its associated implementation is validated by a usecase that requires massivley parallel DBSCAN outlier detection on a 3D point clouds dataset.
... The proposed XSEDE Campus Bridging (CB) Shared Virtual Compute Facility (SVCF) [22] provides both. The CB-SVCF will be deployed using existing XSEDE Execution Management Services (EMS) and the XSEDE Global Federated File System (GFFS) [13]. ...
... However "FCFS" policies on shared resources often lead to organizationally sub-optimal outcomes as not all jobs have the same value for the researchers and often the high value jobs have to wait on the queue while the low value jobs are running. This leads the funded researchers to buy their own clusters rather than using the shared resources, which are frequently underutilized 1 To address the problem, we have developed a market based model for shared resources at universities building on what XSEDE calls a "Shared Virtual Compute Facility" [13] (use case-6), In a SVCF, campus or research groups can aggregate their compute resources into a single virtual environment where users can submit jobs to a single queue and the jobs may run on resources on any of the campuses. The basics of this approach have already been in production at the University of Virginia for over three years. ...
Article
Computational and data scientists at universities have different job resource requirements. Most universities maintain a set of shared resources to support these computational needs. Access to these resources are often free and the access policy is First Come First Serve (FCFS). However, FCFS policies on shared resources often lead to sub-optimal value from the organization's point of view as different jobs contribute different values to the users. Furthermore, the set of resources at a single institution may fail to satisfy the diverse needs of the institutions researchers. We argue the solution is differentiated quality of service (QoS) based on the user's willingness to pay to rationalize resource usage and federation of university resources to improve both the size of the resource pool as well as the diversity of resources. The proposed XSEDE Campus Bridging (CB) Shared Virtual Compute Facility (SVCF) [22] provides both. The CB-SVCF will be deployed using existing XSEDE Execution Management Services (EMS) and the XSEDE Global Federated File System (GFFS) [13]. Before deploying the CB-SVCF it is critical to understand and demonstrate the expected benefits to stakeholders under different load, pricing, and priority scenarios. We have developed a simulator to understand these trade-offs. In this paper we present the simulation results with two qualities of service and data traces from two universities for a month. Our result shows that MBoDS outperforms both FCFS and simple priority scheduling policy for a single site both in terms of overall value and in terms of number of high priority jobs started within the predefined threshold.
... Due to this dependence on a single centralized leader, these Paxos implementations support deployments in local area and cannot deal with write-intensive scenarios across wide-area networks (WANs) well. In recent years, however, coordination over wide-area networks (e.g., across zones, such as datacenters and sites) has gained greater importance, especially for database applications and NewSQL datastores [10], [11], [12], distributed filesystems [13], [14], [15], and social networks [16], [17]. ...
Article
We present WPaxos, a multileader wide area network (WAN) Paxos protocol, that achieves low-latency high-throughput consensus across WAN deployments. WPaxos dynamically partitions the global object-space across multiple concurrent leaders that are deployed strategically using flexible quorums. This partitioning and emphasis on local operations allow our protocol to significantly outperform leaderless approaches, such as EPaxos, while maintaining the same consistency guarantees. Unlike statically partitioned multiple Paxos deployments, WPaxos adapts dynamically to the changing access locality through adaptive object stealing. The ability to quickly react to changing access locality not only speeds up the protocol, but also enables support for mini-transactions. We implemented WPaxos and evaluated it across WAN deployments using the benchmarks introduced in the EPaxos work. Our results show that WPaxos achieves up to 18 times faster average request latency and 65 times faster median latency than EPaxos due to the reduction in WAN communication.
Article
Client-side metadata prefetching is commonly used in wide area network (WAN) file systems because it can effectively hide network latency. However, most existing prefetching approaches do not meet the various prefetching requirements of multiple workloads. They are usually optimized for only one specific workload and have no or harmful effects on other workloads. In this paper, we present a new self-tuning client-side metadata prefetching scheme that uses two different prefetching strategies and dynamically adapts to workload changes. It uses a directory-directed prefetching strategy to prefetch the related file metadata in the same directory, and a correlation-directed prefetching strategy to prefetch the related file metadata accessed across directories. A novel self-tuning mechanism is proposed to efficiently convert the prefetching strategy between directory-directed and correlation-directed prefetching. Experimental results using real system traces show that the hit ratio of the client-side cache can be significantly improved by our self-tuning client-side prefetching. With regards to the multi-workload concurrency scenario, our approach improves the hit ratios for the no-prefetching, directory-directed prefetching, variant probability graph algorithm, variant apriori algorithm, and variant semantic distance algorithm by up to 15.22%, 6.32%, 10.08%, 11.65%, and 10.73%, corresponding to 25.24%, 18.11%, 23.53%, 24.94%, and 24.19% reductions in the average access time, respectively.
Article
WPaxos is a multileader Paxos protocol that provides low-latency and high-throughput consensus across wide-area network (WAN) deployments. WPaxos uses multileaders, and partitions the object-space among these multileaders. Unlike statically partitioned multiple Paxos deployments, WPaxos is able to adapt to the changing access locality through object stealing. Multiple concurrent leaders coinciding in different zones steal ownership of objects from each other using phase-1 of Paxos, and then use phase-2 to commit update-requests on these objects locally until they are stolen by other leaders. To achieve fast phase-2 commits, WPaxos adopts the flexible quorums idea in a novel manner, and appoints phase-2 acceptors to be close to their respective leaders. We implemented WPaxos and evaluated it over WAN deployments across 5 AWS regions. The dynamic partitioning of the object-space and emphasis on zone-local commits allow WPaxos to significantly outperform both partitioned Paxos deployments and leaderless Paxos approaches.
Conference Paper
Universities struggle to provide both the quantity and diversity of compute resources that their researchers need when their researchers need them. Purchasing resources to meet peak demand for all resource types is cost prohibitive for all but a few institutions. Renting capacity on commercial clouds is seen as an alternative to owning. Commercial clouds though expect to be paid. The Campus Compute Cooperative (CCC) provides an alternative to purchasing capacity from commercial providers that provides increased value to member institutions at reduced cost. Member institutions trade their resources with one another to meet both local peak demand as well as provide access to resource types not available on the local campus that are available elsewhere. Participating institutions have dual roles. First as consumers of resources when their researchers use CCC machines, and second as producers of resources when CCC users from other institutions use their resources. In order to avoid the tragedy of the commons in which everyone only wants to use resources, the resource providers will receive credit when their resources are used by others. The consumer is charged based on the quality of service (high, medium, low) and the particulars of the resource provided (speed, interconnection network, memory, etc.). Account balances are cleared monthly. This paper describes solutions to both the technical and sociopolitical challenges of federating university resources and early results with the CCC. Technical issues include the security model, accounting, job specification/management and user interfaces. Socio-political issues include institutional risk management, how to manage market forces and incentives to avoid sub-optimal outcomes, and budget predictability.
Conference Paper
With ever expanding datasets, efficient data management in grids becomes important. This paper describes Cabinet which employs two techniques for efficiently managing data in grids-a caching system and a new file staging approach called coordinated staging. The caching system is designed based on the characteristics of grid applications. Coordinated staging is based on the BitTorrent Protocol model and is specifically designed for High Throughput Computing (HTC) applications, a common use-case for grids. In coordinated staging, each site that is assigned to execute an individual job of the HTC application treats other execution sites as potential replica-stores. In our evaluation, we show that coordinated staging lowered the download time of a file by 3.85x, and increased the throughput of the download by 2.86x over the conventional approach of file transfer from a single source.
Article
Full-text available
The purpose of a distributed file system (DFS) is to allow users of physically distributed computers to share data and storage resources by using a common file system. A typical configuration for a DFS is a collection of workstations and mainframes connected by a local area network (LAN). A DFS is implemented as part of the operating system of each of the connected computers. This paper establishes a viewpoint that emphasizes the dispersed structure and decentralization of both data and control in the design of such systems. It defines the concepts of transparency, fault tolerance, and scalability and discusses them in the context of DFSs. The paper claims that the principle of distributed operation is fundamental for a fault tolerant and scalable DFS design. It also presents alternatives for the semantics of sharing and methods for providing access to remote files. A survey of contemporary UNIX-based systems, namely, UNIX United, Locus, Sprite, Sun's Network File System, and ITC's Andrew, illustrates the concepts and demonstrates various implementations and design alternatives. Based on the assessment of these systems, the paper makes the point that a departure from the extending centralized file systems over a communication network is necessary to accomplish sound distributed file system design.
Article
Full-text available
The Information Technology Center (ITC), a collaborative effort between IBM and Carnegie-Mellon University, is in the process of creating Andrew, a prototype computing and communication system for universities. This article traces the origins of Andrew, discusses its goals and strategies, and gives an overview of the current status of its implementation and usage.
Article
To expand the use of distributed computer infrastructures as well as facilitate grid interoperability, OGSA has developed standards and specifications that address a range of scenarios, including high-throughput computing, federated data management, and service mobility.
Article
In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the data grid. We describe two basic services that we believe are fundamental to the design of a data grid, namely, storage systems and metadata management. Next, we explain how these services can be used to develop higher-level services for replica management and replica selection. We conclude by describing our initial implementation of data grid functionality.
Conference Paper
Data replication is a key issue in a data grid and can be managed in different ways and at different levels of granularity: for example, at the file level or the object level. In the high-energy physics community, data grids are being developed to support the distributed analysis of experimental data. We have produced a prototype data replication tool, the Grid Data Management Pilot (GDMP) that is in production use in one physics experiment, with middleware provided by the Globus toolkit used for authentication, data movement and other purposes. We present a new, enhanced GDMP architecture and prototype implementation that uses Globus data-grid tools for efficient file replication. We also explain how this architecture can address object replication issues in an object-oriented database management system. File transfer over wide-area networks requires specific performance tuning in order to gain optimal data transfer rates. We present performance results obtained with GridFTP, an enhanced version of FTP, and discuss tuning parameters
Article
Naming transparencies, i.e., abstracting the name and binding of the entity being used from the endpoints that are actually doing the work, are used in distributed systems to simplify application development by hiding the complexity of the environment. In this paper we demonstrate how to apply traditional distributed systems naming and binding techniques in the Web Services realm. Specifically, we show how the WS-Naming profile on WS-Addressing Endpoint References can be used for identity, transparent failover, replication, and migration. We begin with a discussion of the traditional distributed systems transparencies. We then present four detailed use cases. Next, we provide brief background on both WS-Addressing and WS-Naming. Finally, we show how WS-Naming can be used to provide transparent implementations of our use cases.
Article
A summary of and historical perspective on work done to implement easy-to-share distributed file systems based on the Unix model are presented. Andrew and Coda are distributed Unix file systems that embody many of the recent advances in solving the problem of data sharing in large, physically dispersed workstation environments. The Andrew architecture is presented, the scalability and security of the system are discussed. The Coda system is examined, with emphasis on its high availability.< >