Sudharshan Vazhkudai

Oak Ridge National Laboratory, Oak Ridge, Florida, United States

Are you Sudharshan Vazhkudai?

Claim your profile

Publications (72)13.7 Total impact

  • Yang Liu, Raghul Gunasekaran, Xiaosong Ma, Sudharshan S. Vazhkudai
    [Show abstract] [Hide abstract]
    ABSTRACT: Competing workloads on a shared storage system cause I/O resource contention and application performance vagaries. This problem is already evident in today's HPC storage systems and is likely to become acute at exascale. We need more interaction between application I/O requirements and system software tools to help alleviate the I/O bottleneck, moving towards I/O-aware job scheduling. However, this requires rich techniques to capture application I/O characteristics, which remain evasive in production systems. Traditionally, I/O characteristics have been obtained using client-side tracing tools, with drawbacks such as non-trivial instrumentation/development costs, large trace traffic, and inconsistent adoption. We present a novel approach, I/O Signature Identifier (IOSI), to characterize the I/O behavior of data-intensive applications. IOSI extracts signatures from noisy, zero-overhead server-side I/O throughput logs that are already collected on today's supercomputers, without interfering with the compiling/execution of applications. We evaluated IOSI using the Spider storage system at Oak Ridge National Laboratory, the S3D turbulence application (running on 18,000 Titan nodes), and benchmark-based pseudo-applications. Through our experiments we confirmed that IOSI effectively extracts an application's I/O signature despite significant server-side noise. Compared to client-side tracing tools, IOSI is transparent, interface-agnostic, and incurs no overhead. Compared to alternative data alignment techniques (e.g., dynamic time warping), it offers higher signature accuracy and shorter processing time.
    Proceedings of the 12th USENIX conference on File and Storage Technologies; 02/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Modern scientific discovery is increasingly driven by large-scale supercomputing simulations, followed by data analysis tasks. These data analyses are either performed offline, on smaller-scale clusters, or on the supercomputer itself. Unfortunately, these techniques suffer from performance and energy inefficiencies due to increased data movement between the compute and storage subsystems. Therefore, we propose Active Flash, an insitu scientific data analysis approach, wherein data analysis is conducted on the solid-state device (SSD), where the data already resides. Our performance and energy models show that Active Flash has the potential to address many of the aforementioned concerns without degrading HPC simulation performance. In addition, we demonstrate an Active Flash prototype built on a commercial SSD controller, which further reaffirms the viability of our proposal.
    Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST'13). 01/2013;
  • H.M. Monti, A.R. Butt, S.S. Vazhkudai
    [Show abstract] [Hide abstract]
    ABSTRACT: Innovative scientific applications and emerging dense data sources are creating a data deluge for high-end supercomputing systems. Modern applications are often collaborative in nature, with a distributed user base for input and output data sets. Processing such large input data typically involves copying (or staging) the data onto the supercomputer's specialized high-speed storage, scratch space, for sustained high I/O throughput. This copying is crucial as remotely accessing the data while an application executes results in unnecessary delays and consequently performance degradation. However, the current practice of conservatively staging data as early as possible makes the data vulnerable to storage failures, which may entail restaging and reduced job throughput. To address this, we present a timely staging framework that uses a combination of job start-up time predictions, user-specified volunteer or cloud-based intermediate storage nodes, and decentralized data delivery to coincide input data staging with job start-up. Evaluation of our approach using both PlanetLab and Azure cloud services, as well as simulations based on three years of Jaguar supercomputer (No. 3 in Top500) job logs show as much as 91.0 percent reduction in staging times compared to direct transfers, 75.2 percent reduction in wait time on scratch, and 2.4 percent reduction in usage/hour. (An earlier version of this paper appears in [30].).
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(9):1841-1851. · 1.80 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Modern scientific discovery often involves running complex application simulations on supercomputers, followed by a sequence of data analysis tasks on smaller clusters. This offline approach suffers from significant data movement costs such as redundant I/O, storage bandwidth bottleneck, and wasted CPU cycles, all of which contribute to increased energy consumption and delayed end-to-end performance. Technology projections for an exascale machine indicate that energy-efficiency will become the primary design metric. It is estimated that the energy cost of data movement will soon rival the cost of computation. Consequently, we can no longer ignore the data movement costs in data analysis. To address these challenges, we advocate executing data analysis tasks on emerging storage devices, such as SSDs. Typically, in extreme-scale systems, SSDs serve only as a temporary storage system for the simulation output data. In our approach, Active Flash, we propose to conduct in-situ data analysis on the SSD controller without degrading the performance of the simulation job. By migrating analysis tasks closer to where the data resides, it helps reduce the data movement cost. We present detailed energy and performance models for both active flash and offline strategies, and study them using extreme-scale application simulations, commonly used data analytics kernels, and supercomputer system configurations. Our evaluation suggests that active flash is a promising approach to alleviate the storage bandwidth bottleneck, reduce the data movement cost, and improve the overall energy efficiency.
    Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems; 10/2012
  • Proc. 2012 Workshop on Power Aware Computing and Systems (HotPower 2012); 10/2012
  • Source
    Chao Wang, Sudharshan S Vazhkudai, Xiaosong Ma, Christian Engelmann
    [Show abstract] [Hide abstract]
    ABSTRACT: DRAM is a precious resource in extreme-scale machines and is increasingly becoming scarce, mainly due to the growing number of cores per node. On future multi-petaflop and exaflop machines, the memory pressure is likely to be so severe that we need to rethink our memory usage models. Fortunately, the advent of non-volatile memory (NVM) offers a unique opportunity in this space. Current NVM offerings possess several desirable properties, such as low cost and power efficiency, but suffer from high latency and lifetime issues. We need rich techniques to be able to use them alongside DRAM. In this talk, we present a novel approach for exploiting NVM as a secondary memory partition so that applications can explicitly allocate and manipulate memory regions therein. More specifically, we propose an NVMalloc library with a suite of services that enables applications to access a distributed NVM storage system. We have devised ways within NVMalloc so that the storage system, built from compute node-local NVM devices, can be accessed in a byte-addressable fashion using the memory mapped I/O interface. Our approach has the potential to re-energize out-of-core computations on largescale machines by having applications allocate certain variables through NVMalloc, thereby increasing the overall memory capacity available. Our evaluation on a 128-core cluster shows that NVMalloc enables applications to compute problem sizes larger than the physical memory in a cost-effective manner. It can bring more performance/efficiency gain with increased computation time between NVM memory accesses or increased data access locality. In addition, our results suggest that while NVMalloc enables transparent access to NVM-resident variables, the explicit control it provides is crucial to optimize application performance.
    01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.
    01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flexible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs - both on the client and server sides - to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes acceptable overhead.
    01/2012;
  • NOBUGS 2010, Gatlinburg, TN; 10/2011
  • Source
    H.M. Monti, A.R. Butt, S.S. Vazhkudai
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern High-Performance Computing (HPC) centers are facing a data deluge from emerging scientific applications. Supporting large data entails a significant commitment of the high-throughput center storage system, scratch space. However, the scratch space is typically managed using simple “purge policies,” without sophisticated end-user data services to balance resource consumption and user serviceability. End-user data services such as offloading are performed using point-to-point transfers that are unable to reconcile center's purge and users' delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. Such inefficiencies can be prohibitive to sustaining high performance. In this paper, we address the above issues by designing a framework for the timely, decentralized offload of application result data. Our framework uses an overlay of user-specified intermediate and landmark sites to orchestrate a decentralized fault-tolerant delivery. We have implemented our techniques within a production job scheduler (PBS) and data transfer tool (BitTorrent). Our evaluation using both a real implementation and supercomputer job log-driven simulations show that: the offloading times can be significantly reduced (90.4 percent for a 5 GB data transfer); the exposure window can be minimized while also meeting center-user service level agreements.
    IEEE Transactions on Parallel and Distributed Systems 09/2011; · 1.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the checkpointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5% cost savings with only negligible impact on performance.
    2011 International Conference on Distributed Computing Systems, ICDCS 2011, Minneapolis, Minnesota, USA, June 20-24, 2011; 01/2011
  • 01/2011;
  • Source
    Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhkudai
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern High Performance Computing (HPC) applications process very large amounts of data. A critical research challenge lies in transporting input data to the HPC center from a number of distributed sources, e.g., scientific experiments and web repositories, etc., and offloading the result data to geographically distributed, intermittently available end-users, often over under-provisioned connections. Such end-user data services are typically performed using point-to-point transfers that are designed for well-endowed sites and are unable to reconcile the center's resource usage and users delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. To overcome these inefficiencies, decentralized HPC data services are emerging as viable alternatives. In this paper, we develop and enhance such distributed data services by designing CATCH, a Cloud-based Adaptive data Transfer serviCe for HPC. CATCH leverages a bevy of cloud storage resources to orchestrate a decentralized data transport with fail-over capabilities. Our results demonstrate that CATCH is a feasible approach, and can help improve the data transfer times at the HPC center by as much as 81.1% for typical HPC workloads.
    25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May, 2011 - Conference Proceedings; 01/2011
  • Source
    Mark L Green, Stephen D Miller, Sudharshan S Vazhkudai, James R Trater
    [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale neutron facilities such as the Spallation Neutron Source (SNS) located at Oak Ridge National Laboratory need easy-to-use access to Department of Energy Leadership Computing Facilities and experiment repository data. The Orbiter thick- and thin-client and its supporting Service Oriented Architecture (SOA) based services (available at https://orbiter.sns.gov) consist of standards-based components that are reusable and extensible for accessing high performance computing, data and computational grid infrastructure, and cluster-based resources easily from a user configurable interface. The primary Orbiter system goals consist of (1) developing infrastructure for the creation and automation of virtual instrumentation experiment optimization, (2) developing user interfaces for thin- and thick-client access, (3) provide a prototype incorporating major instrument simulation packages, and (4) facilitate neutron science community access and collaboration. The secure Orbiter SOA authentication and authorization is achieved through the developed Virtual File System (VFS) services, which use Role-Based Access Control (RBAC) for data repository file access, thin-and thick-client functionality and application access, and computational job workflow management. The VFS Relational Database Management System (RDMS) consists of approximately 45 database tables describing 498 user accounts with 495 groups over 432,000 directories with 904,077 repository files. Over 59 million NeXus file metadata records are associated to the 12,800 unique NeXus file field/class names generated from the 52,824 repository NeXus files. Services that enable (a) summary dashboards of data repository status with Quality of Service (QoS) metrics, (b) data repository NeXus file field/class name full text search capabilities within a Google like interface, (c) fully functional RBAC browser for the read-only data repository and shared areas, (d) user/group defined and shared metadata for data repository files, (e) user, group, repository, and web 2.0 based global positioning with additional service capabilities are currently available. The SNS based Orbiter SOA integration progress with the Distributed Data Analysis for Neutron Scattering Experiments (DANSE) software development project is summarized with an emphasis on DANSE Central Services and the Virtual Neutron Facility (VNF). Additionally, the DANSE utilization of the Orbiter SOA authentication, authorization, and data transfer services best practice implementations are presented.
    Journal of Physics Conference Series 12/2010; 251(1):012095.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In a busy world, continuing with the status-quo, to do things the way we are already familiar, often seems to be the most efficient way to conduct our work. We look for the value-add to decide if investing in a new method is worth the effort. How shall we evaluate if we have reached this tipping point for change? For contemporary researchers, understanding the properties of the data is a good starting point. The new generation of neutron scattering instruments being built are higher resolution and produce one or more orders of magnitude larger data than the previous generation of instruments. For instance, we have grown out of being able to perform some important tasks with our laptops – the data are too big and the computations would simply take too long. These large datasets can be problematic as facility users now begin to grapple with many of the same issues faced by more established computing communities. These issues include data access, management, and movement, data format standards, distributed computing, and collaboration among others. The Neutron Science Portal has been architected, designed, and implemented to provide users with an easy-to-use interface for managing and processing data, while also keeping an eye on meeting modern cybersecurity requirements imposed on institutions. The cost of entry for users has been lowered by utilizing a web interface providing access to backend portal resources. Users can browse or search for data which they are allowed to see, data reduction applications can be run without having to load the software, sample activation calculations can be performed for SNS and HFIR beamlines, McStas simulations can be run on TeraGrid and ORNL computers, and advanced analysis applications such as those being produced by the DANSE project can be run. Behind the scenes is a "live cataloging" system which automatically catalogs and archives experiment data via the data management system, and provides proposal team members access to their experiment data. The complexity of data movement and utilizing distributed computing resources has been taken care on behalf of users. Collaboration is facilitated by providing users a read/writeable common area, shared
    Journal of Physics Conference Series 12/2010; 251(1):012096.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The unique contributions of the Neutron Science TeraGrid Gateway (NSTG) are the connection of national user facility instrument data sources to the integrated cyberinfrastructure of the National Science FoundationTeraGrid and the development of a neutron science gateway that allows neutron scientists to use TeraGrid resources to analyze their data, including comparison of experiment with simulation. The NSTG is working in close collaboration with the Spallation Neutron Source (SNS) at Oak Ridge as their principal facility partner. The SNS is a next-generation neutron source. It has completed construction at a cost of $1.4 billion and is ramping up operations. The SNS will provide an order of magnitude greater flux than any previous facility in the world and will be available to all of the nation's scientists, independent of funding source, on a peer-reviewed merit basis. With this new capability, the neutron science community is facing orders of magnitude larger data sets and is at a critical point for data analysis and simulation. There is a recognized need for new ways to manage and analyze data to optimize both beam time and scientific output. The TeraGrid is providing new capabilities in the gateway for simulations using McStas and a fitting service on distributed TeraGrid resources to improved turnaround. NSTG staff are also exploring replicating experimental data in archival storage. As part of the SNS partnership, the NSTG provides access to gateway support, cyberinfrastructure outreach, community development, and user support for the neutron science community. This community includes not only SNS staff and users but extends to all the major worldwide neutron scattering centers.
    Journal of Physics Conference Series 12/2010; 251(1):012097.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The new generation of neutron scattering instruments being built are higher resolution and produce one or more orders of magnitude larger data than the previous generation of instruments. For instance, we have grown out of being able to perform some important tasks with our laptops. The data sizes are too big and the computational time would be too long. These large datasets can be problematic as facility users now begin to struggle with many of the same issues faced by more established computing communities. These issues include data access, management, and movement, data format standards, distributed computing, and collaboration with others. The Neutron Science Portal has been designed, and implemented to provide users with an easy-to-use interface for managing and processing data, while also keeping an eye on meeting modern computer security requirements that are currently being imposed on institutions. Users can browse or search for data which they are allowed to see, run data reduction and analysis applications, perform sample activation calculations and perform McStas simulations. Collaboration is facilitated by providing users a read/writeable common area, shared across all experiment team members. The portal currently has over 370 registered users; almost 7TB of experiment and user data, approximately 1,000,000 files cataloged, and had almost 10,000 unique visits last year. Future directions for enhancing portal robustness include examining how to mirror data and portal services, better facilitation of collaborations via virtual organizations, enhancing disconnected service via "thick client" applications, and better inter-facility connectivity to support cross-cutting research.
    Journal of Physics Conference Series 11/2010; 247(1):012013.
  • Source
    H.M. Monti, A.R. Butt, S.S. Vazhkudai
    [Show abstract] [Hide abstract]
    ABSTRACT: Innovative scientific applications and emerging dense data sources are creating a data deluge for high-end computing systems. Processing such large input data typically involves copying (or staging) onto the supercomputer's specialized high-speed storage, scratch space, for sustained high I/O throughput. The current practice of conservatively staging data as early as possible makes the data vulnerable to storage failures, which may entail re-staging and consequently reduced job throughput. To address this, we present a timely staging framework that uses a combination of job startup time predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job start-up. By delaying staging to when it is necessary, the exposure to failures and its effects can be reduced. Evaluation using both PlanetLab and simulations based on three years of Jaguar (No. 1 in Top500) job logs show as much as 85.9% reduction in staging times compared to direct transfers, 75.2% reduction in wait time on scratch, and 2.4% reduction in usage/hour.
    Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on; 05/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Scaling computations on emerging massive-core supercomputers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications' end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks -- checkpointing, de-duplication, and scientific data format transformation -- so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95% and 41.34%, respectively.
    Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13-19, 2010; 01/2010
  • Sudharshan S Vazhkudai, Ali R Butt, Xiaosong Ma
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter, the authors present an overview of the utility of distributed storage systems in supporting modern applications that are increasingly becoming data intensive. Their coverage of distributed storage systems is based on the requirements imposed by data intensive computing and not a mere summary of storage systems. To this end, they delve into several aspects of supporting data-intensive analysis, such as data staging, offloading, checkpointing, and end-user access to terabytes of data, and illustrate the use of novel techniques and methodologies for realizing distributed storage systems therein. The data deluge from scientific experiments, observations, and simulations is affecting all of the aforementioned day-to-day operations in data-intensive computing. Modern distributed storage systems employ techniques that can help improve application performance, alleviate I/O bandwidth bottleneck, mask failures, and improve data availability. They present key guiding principles involved in the construction of such storage systems, associated tradeoffs, design, and architecture, all with an eye toward addressing challenges of data-intensive scientific applications. They highlight the concepts involved using several case studies of state-of-the-art storage systems that are currently available in the data-intensive computing landscape.
    01/2010;

Publication Stats

882 Citations
13.70 Total Impact Points

Institutions

  • 2004–2012
    • Oak Ridge National Laboratory
      • Computer Science and Mathematics Division
      Oak Ridge, Florida, United States
  • 2011
    • Virginia Polytechnic Institute and State University
      • Department of Computer Science
      Blacksburg, VA, United States
  • 2008
    • University of British Columbia - Vancouver
      • Department of Electrical and Computer Engineering
      Vancouver, British Columbia, Canada
  • 2005
    • North Carolina State University
      • Department of Computer Science
      Raleigh, NC, United States
  • 2000–2003
    • University of Mississippi
      • Department of Computer and Information Science
      Oxford, MS, United States
  • 2001
    • Monash University (Australia)
      Melbourne, Victoria, Australia