Conference Paper

enabling cross-layer optimizations in storage systems with custom metadata.

DOI: 10.1145/1383422.1383451 Conference: Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 23-27 June 2008, Boston, MA, USA
Source: DBLP

ABSTRACT Today, several data-storage systems allow applications to create and manage custom metadata to improve data search and navigability in large-scale storage systems. Our thesis is that, besides improving search and navigability, custom metadata can also serve as a two-way communication mechanism between applications and the storage layer to enable cross-layer optimizations in a uniform, application-independent and incremental fashion.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Peer-to-peer grids are large-scale, dynamic environments where autonomous sites share computing resources. Producing and maintaining relevant and up-to-date resource information in such environments is a challenging problem, due to the grid scale, the resource heterogeneity, and the variety of user demand. This work proposes a peer-to-peer annotation approach where users can freely annotate available resources as a solution to this problem. We advocate that the proposed approach (i) is scalable, as the job of updating the resource information is divided among users; (ii) will improve resources' utilization, by reducing the amount of resources which are allocated to users without matching their applications constraints; and (iii) will allow resource allocators to increase users' utility, leveraging access to more detailed preference descriptions. The paper also discusses the challenges in implementing and deploying such approach and present solutions to tackle these challenges.
    Peer-to-Peer Computing , 2008. P2P '08. Eighth International Conference on; 10/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: Versatile storage systems aim to maximize storage resource utilization by supporting the ability to `morph' the storage system to best match the application's demands. To this end, versatile storage systems significantly extend the deployment- or run-time configurability of the storage system. This flexibility, however, introduces a new problem: a much larger, and potentially dynamic, configuration space makes manually configuring the storage system an undesirable if not unfeasible task. This paper presents our initial progress towards answering the question: “How can we configure a distributed storage system (i.e., enable/disable its various optimizations and configure their parameters) with minimal human intervention?” We discuss why manually configuring the storage system is undesirable; present the success criteria for an automated configuration solution; propose a generic architecture that supports automated configuration; and, finally, instantiate this architecture into a first prototype, which controls the configuration of similarity detection optimizations in the MosaStore distributed storage system. Our evaluation results demonstrate that the prototype can provide performance close to the optimal configuration at the cost of minimal overhead.
    Grid Computing (GRID), 2010 11th IEEE/ACM International Conference on; 11/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we propose, implement and test a novel approach to the management of parallel I/O in high-performance computing. Our proposed approach is built upon three complementary ideas: (i) allowing users to place hints into the application code indicating high-level data access patterns, (ii) enabling an optimizing compiler to process these hints and develop I/O optimization strategies, and (iii) enhancing the I/O stack to accept these optimizations and process them across the different layers in the stack. We describe a general hint processing framework that accommodates this approach and demonstrate its potential by applying it to two sample problems: (i) shared storage cache management and (ii) I/O prefetching. In the former, our approach decides, at each program point of interest, the ideal set of data blocks to keep in shared storage caches in the I/O stack, and in the latter, the high-level data access pattern is propagated from application layer to the parallel file system layer for prefetching data from the storage subsystem. Our approach is designed to complement and work synergistically with the MPI-IO and PVFS frameworks and exploits the characteristics of applications written using these software. We tested our approach using both synthetic data access patterns and disk I/O intensive application programs. The results collected indicate that the proposed approach improves over existing storage caching and I/O prefetching schemes by 28% and 66%, respectively.
    Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, Chicago, Illinois, USA, June 21-25, 2010; 01/2010

Full-text (2 Sources)

Available from
May 28, 2014