Conference Paper

MySRB and SRB - components of a Data Grid

San Diego Supercomput. Center, California Univ., San Diego, CA
DOI: 10.1109/HPDC.2002.1029930 Conference: High Performance Distributed Computing, 2002. HPDC-11 2002. Proceedings. 11th IEEE International Symposium on
Source: IEEE Xplore

ABSTRACT Data Grids are becoming increasingly important in scientific communities for sharing large data collections and for archiving and disseminating them in a digital library framework. The Storage Resource Broker (SRB) provides transparent virtualized middleware for sharing data across distributed, heterogeneous data resources separated by different administrative and security domains. The MySRB is a Web-based interface to the SRB that provides a user-friendly interface to distributed collections brokered by the SRB. In this paper we briefly describe the use of the SRB infrastructure as tools in the data grid architecture for building distributed data collections, digital libraries, and persistent archives. We also provide details about the MySRB and its functionalities.


Available from: Reagan Wentworth Moore, Jun 04, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Grids provide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reached the stage where deployments are run in production mode. In the most active Grid community, the scientific community, jobs are data and compute intensive. Scientific Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models and hence reevaluate traditional resource management techniques. In this thesis, we study usage patterns from a large-scale scientificGrid collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. We perform a detailed workload characterization which led us to propose a new data abstraction, filecule, that groups correlated files. We characterize filecules and show that they are an appropriate data granularity for resource management. In scientific applications, job scheduling and data staging are tightly coupled. The only algorithm previously proposed for this class of applications, Greedy Request Value (GRV), uses a function that assigns a relative value to a job. We wrote a cache simulator that uses the same technique of combining cache replacement with job reordering to evaluate and compare quantitatively a set of alternative solutions. These solutions are combinations of Least Recently Used (LRU) and GRV from the cache replacement space with First-Come First-Served (FCFS) and the GRV-specific job reordering from the scheduling space. Using real workload from the DZero Experiment at Fermi National Accelerator Laboratory, we measure and compare performance based on byte hit rate, cache change, job waiting time, job waiting queue length, and scheduling overhead. Based on our experimental investigations, we propose a new technique that combines LRU for cache replacement and job scheduling based onthe relative request value. This technique incurs less data transfer costs than the GRV algorithm and shorter job processing delays than FCFS. We also propose using filecules for data management to further improve the results obtained from the above LRU and GRV combination. We show that filecules can be identified in practical situations and demonstrate how the accuracy of filecule identification influences caching performance.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In order to support the user in composing heterogeneous distributed applications, it is neces- sary to have suitable higher-level services that hide the complexity of lower-level details. One approach for such a higher-level service is the Grid Job Handler, which we established within the Fraunhofer Resource Grid on top of the Globus toolkit. The most innovative part of the Grid Job Handler is the Petri-net-based workflow model that allows the definition of arbitrary work- flows with only three different components: transitions, places, and arcs. This enables the easy orchestration of complex workflows,including conditions and loops and regarding the dataflow as well as the control flow of distributed applications. The dynamic workflow model introduced here takes advantage of the Petri net refinement theory, which allows adding additional tasks to the workflow during runtime, such as transfer tasks, software deployment tasks or fault manage- ment tasks. Within this framework, distributed applications can be defined independently from the infrastructure, just by connecting software components and data. The resource mapping that maps the abstract resource requirements onto real resources is based on an XML-based resource definition language which includes information about the dependencies between resources.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed digital repositories can be used to address critical issues of long-term digital preservation and disaster management for large data centers. A policy-driven system provides an ideal solution for managing distributed repositories that require high flexibility and high configurability. Recent studies demonstrate that the integrated Rule-Oriented Data System, a peer-to-peer server middleware, provides the requisite dynamic extensibility needed to manage time-varying policies, automate validation of assessment criteria, manage ingestion processes, manage access policies, and manage preservation policies. The policy management can be implemented underneath existing digital library infrastructure such as Fedora.
    International Journal on Digital Libraries 07/2012; 12(1). DOI:10.1007/s00799-012-0082-3