Conference Paper

Experiences with Hierarchical Storage Management Support in Blue Whale File System.

DOI: 10.1109/PDCAT.2010.35 Conference: 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2010, Wuhan, China, 8-11 December, 2010
Source: DBLP

ABSTRACT Abstract-In order to meet the challenges of significant storage and application growth, as well as shortened backup windows and limited IT resources, more and more organizations embrace Hierarchical Storage Management (HSM). Parts of SAN file systems provide the ability to support HSM, but they didn't provide more details in design considerations and how to implement. In this paper, we present design considerations in Blue Whale File System (BWFS) which is a high performance SAN file system and implement the system. Event, migration and attribute constitute the three aspects of data management mechanism in BWFS. As for READ/WRITE/TRUNCATE event, the problem is that a client directly accesses to storage devices containing file data, rather than through a metadata server. We monitor LayoutGet operation instead of real I/O path. BWFS is a multi-volume file system which can support single name space across multiple data volumes. When different storage devices are managed by BWFS, migration is between volumes in BWFS. We also implement migration between BWFS and external repository. We believe the solutions to the problem we encountered are useful to file system developers.

  • Journal of Computer Research and Development 01/2005; 42(6). DOI:10.1360/crad20050620
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The high performance storage system (HPSS) provides scalable hierarchical storage management (HSM), archive, and file system services. Its design, implementation and current dominant use are focused on HSM and archive services. It is also a general-purpose, global, shared, parallel file system, potentially useful in other application domains. When HPSS design and implementation began over a decade ago, scientific computing power and storage capabilities at a site, such as a DOE national laboratory, was measured in a few 10 s of gigaops, data archived in HSMs in a few 10 s of terabytes at most, data throughput rates to an HSM in a few megabytes/s, and daily throughput with the HSM in a few gigabytes/day. At that time, the DOE national laboratories and IBM HPSS design team recognized that we were headed for a data storage explosion driven by computing power rising to teraops/petaops requiring data stored in HSMs to rise to petabytes and beyond, data transfer rates with the HSM to rise to gigabytes/s and higher, and daily throughput with a HSM in 10 s of terabytes/day. This paper discusses HPSS architectural, implementation and deployment experiences that contributed to its success in meeting the above orders of magnitude scaling targets. We also discuss areas that need additional attention as we continue significant scaling into the future.
    Mass Storage Systems and Technologies, 2005. Proceedings. 22nd IEEE / 13th NASA Goddard Conference on; 05/2005