Abhishek Chandra

University of Minnesota Duluth, Duluth, Minnesota, United States

Are you Abhishek Chandra?

Claim your profile

Publications (51)12.31 Total impact

  • A. Chandra, J. Weissman, B. Heintz
    [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing services are traditionally deployed on centralized computing infrastructures confined to a few data centers, while cloud applications run in a single data center. However, the cloud's centralized nature can be limiting in terms of performance and cost for applications where users, data, and computation are distributed. The authors present an overview of distributed clouds that might be better suited for such applications. They briefly describe the distributed cloud landscape and introduce Nebula, a highly decentralized cloud that uses volunteer edge resources. The authors provide insights into some of its key properties and design issues, and describe a distributed MapReduce application scenario to illustrate the benefits and trade-offs of using distributed and decentralized clouds for distributed data-intensive computing applications.
    IEEE Internet Computing 01/2013; 17(5):70-73. · 2.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Distributed data-intensive workflow applications are increasingly relying on and integrating remote resources including community data sources, services, and computational platforms. Increasingly, these are made available as data, SAAS, and IAAS clouds. The execution of distributed data-intensive workflow applications can exposé network bottlenecks between clouds that compromise performance. In this paper, we focus on alleviating network bottlenecks by using a proxy network. In particular, we show how proxies can eliminate network bottlenecks by smart routing and perform in-network computations to boost workflow application performance. A novel aspect of our work is the inclusion of multiple proxies to accelerate different workflow stages optimizing different performance metrics. We show that the approach is effective for workflow applications and broadly applicable. Using Montage as an exemplar workflow application, results obtained through experiments on Planet Lab showed how different proxies acting in a variety of roles can accelerate distinct stages of Montage. Our microbenchmarks also show that routing data through select proxies can accelerate network transfer for TCP/UDP bandwidth, delay, and jitter, in general.
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Map Reduce has been designed to accommodate large-scale data-intensive workloads running on large single-site homogeneous clusters. Researchers have begun to explore the extent to which the original Map Reduce assumptions can be relaxed including skewed workloads, iterative applications, and heterogeneous computing environments. Our work continues this exploration by applying Map Reduce across widely distributed data over distributed computation resources. This problem arises when datasets are generated at multiple sites as is common in many scientific domains and increasingly e-commerce applications. It also occurs when multi-site resources such as geographically separated data centers are applied to the same Map Reduce job. Using Hadoop, we show that the absence of network and node homogeneity and locality of data lead to poor performance. The problem is that interaction of Map Reduce phases becomes pronounced in the presence of heterogeneous network behavior. In this paper, we propose new cross-phase optimization techniques that enable independent Map Reduce phases to influence one another. We propose techniques that optimize the push and map phases to enable push-map overlap and to allow map behavior to feed back into push dynamics. Similarly, we propose techniques that optimize the map and reduce phases to enable shuffle cost to feed back and affect map scheduling decisions. We evaluate the benefits of our techniques in both Amazon EC2 and Planet Lab. The experimental results show the potential of these techniques as performance is improved from 7%-18% depending on the execution environment and application.
    Cloud Engineering (IC2E), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mobile devices, such as smart phones and tablets, are becoming the universal interface to online services and applications. However, such devices have limited computational power and battery life, which limits their ability to execute resource-intensive applications. Computation outsourcing to external resources has been proposed as a technique to alleviate this problem. Most existing work on mobile outsourcing has focused on either single application optimization or outsourcing to fixed, local resources, with the assumption that wide-area latency is prohibitively high. However, the opportunity of improving the outsourcing performance by utilizing the relation among multiple applications and optimizing the server provisioning is neglected. In this paper, we present the design and implementation of an Android/Amazon EC2-based mobile application outsourcing framework, leveraging the cloud for scalability, elasticity, and multi-user code/data sharing. Using this framework, we empirically demonstrate that the cloud is not only feasible but desirable as an offloading platform for latency-tolerant applications. We have proposed to use data mining techniques to detect data sharing across multiple applications, and developed novel scheduling algorithms that exploit such data sharing for better outsourcing performance. Additionally, our platform is designed to dynamically scale to support a large number of mobile users concurrently. Experiments show that our proposed techniques and algorithms substantially improve application performance, while achieving high efficiency in terms of computation resource and network usage.
    Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on; 01/2012
  • Source
    Jinoh Kim, A. Chandra, J.B. Weissman
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed computing applications are increasingly utilizing distributed data sources. However, the unpredictable cost of data access in large-scale computing infrastructures can lead to severe performance bottlenecks. Providing predictability in data access is, thus, essential to accommodate the large set of newly emerging large-scale, data-intensive computing applications. In this regard, accurate estimation of network performance is crucial to meeting the performance goals of such applications. Passive estimation based on past measurements is attractive for its relatively small overhead compared to relying on explicit probing. In this paper, we take a passive approach for network performance estimation. Our approach is different from existing passive techniques that rely either on past direct measurements of pairs of nodes or on topological similarities. Instead, we exploit secondhand measurements collected by other nodes without any topological restrictions. In this paper, we present Overlay Passive Estimation of Network performance (OPEN), a scalable framework providing end-to-end network performance estimation based on secondhand measurements, and discuss how OPEN achieves cost-effective estimation in a large-scale infrastructure. Our extensive experimental results show that OPEN estimation can be applicable for replica and resource selections commonly used in distributed computing.
    IEEE Transactions on Parallel and Distributed Systems 09/2011; · 1.80 Impact Factor
  • M. Cardosa, A. Singh, H. Pucha, A. Chandra
    [Show abstract] [Hide abstract]
    ABSTRACT: MapReduce is a distributed computing paradigm widely used for building large-scale data processing applications. When used in cloud environments, MapReduce clusters are dynamically created using virtual machines (VMs) and managed by the cloud provider. In this paper, we study the energy efficiency problem for such MapReduce clusters in private cloud environments, that are characterized by repeated, batch execution of jobs. We describe a unique spatio-temporal tradeoff that includes efficient spatial fitting of VMs on servers to achieve high utilization of machine resources, as well as balanced temporal fitting of servers with VMs having similar runtimes to ensure a server runs at a high utilization throughout its uptime. We propose VM placement algorithms that explicitly incorporate these tradeoffs. Our algorithms achieve energy savings over existing placement techniques, and an additional optimization technique further achieves savings while simultaneously improving job performance.
    Cloud Computing (CLOUD), 2011 IEEE International Conference on; 08/2011
  • Source
    Jinoh Kim, Abhishek Chandra, Jon B. Weissman
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed computing applications are increasingly utilizing distributed data sources. However, the unpredictable cost of data access in large-scale computing infrastructures can lead to severe performance bottlenecks. Providing predictability in data access is, thus, essential to accommodate the large set of newly emerging large-scale, data-intensive computing applications. In this regard, accurate estimation of network performance is crucial to meeting the performance goals of such applications. Passive estimation based on past measurements is attractive for its relatively small overhead compared to relying on explicit probing. In this paper, we take a passive approach for network performance estimation. Our approach is different from existing passive techniques that rely either on past direct measurements of pairs of nodes or on topological similarities. Instead, we exploit secondhand measurements collected by other nodes without any topological restrictions. In this paper, we present Overlay Passive Estimation of Network performance (OPEN), a scalable framework providing end-to-end network performance estimation based on secondhand measurements, and discuss how OPEN achieves cost-effective estimation in a large-scale infrastructure. Our extensive experimental results show that OPEN estimation can be applicable for replica and resource selections commonly used in distributed computing.
    IEEE Trans. Parallel Distrib. Syst. 01/2011; 22:1365-1373.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current cloud infrastructures are important for their ease of use and performance. However, they suffer from several shortcomings. The main problem is inefficient data mobility due to the centralization of cloud resources. We believe such clouds are highly unsuited for dispersed-data-intensive applications, where the data may be spread at multiple geographical locations (e.g., distributed user blogs). Instead, we propose a new cloud model called Nebula: a dispersed, context-aware, and cost-effective cloud. We provide experimental evidence for the need for Nebulas using a distributed blog analysis application followed by the system architecture and components of our system.
    01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MapReduce is a highly-popular paradigm for high-performance com-puting over large data sets in large-scale platforms. However, when the source data is widely distributed and the computing platform is also distributed, e.g. data is collected in separate data center loca-tions, the most efficient architecture for running Hadoop jobs over the entire data set becomes non-trivial. In this paper, we show the traditional single-cluster MapReduce setup may not be suitable for situations when data and compute resources are widely distributed. Further, we provide recommendations for alternative (and even hi-erarchical) distributed MapReduce setup configurations, depending on the workload and data set.
    01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MapReduce has gained in popularity as a distributed data analysis paradigm, particularly in the cloud, where MapReduce jobs are run on virtual clusters. The provisioning of MapReduce jobs in the cloud is an important problem for optimizing several user as well as provider-side metrics, such as runtime, cost, throughput, energy, and load. In this paper, we present an intelligent provisioning framework called STEAMEngine that consists of provisioning algorithms to optimize these metrics through a set of common building blocks. These building blocks enable spatio-temporal tradeoffs unique to MapReduce provisioning: along with their resource requirements (spatial component), a MapReduce job runtime (temporal component) is a critical element for any provisioning algorithm. We also describe tw o novel provisioning algorithms — a user-driven performance optimization and a provider-driven energy optimization — that leverage these building blocks. Our experimental results based on an Amazon EC2 cluster and a local Xen/Hadoop cluster show the benefits of STEAMEngine through improvements in performance and energy via the use of these algorithms and building blocks.
    18th International Conference on High Performance Computing, HiPC 2011, Bengaluru, India, December 18-21, 2011; 01/2011
  • Source
    M. Cardosa, A. Chandra
    [Show abstract] [Hide abstract]
    ABSTRACT: Resource discovery is an important process for finding suitable nodes that satisfy application requirements in large loosely coupled distributed systems. Besides internode heterogeneity, many of these systems also show a high degree of intranode dynamism, so that selecting nodes based only on their recently observed resource capacities can lead to poor deployment decisions resulting in application failures or migration overheads. However, most existing resource discovery mechanisms rely mainly on recent observations to achieve scalability in large systems. In this paper, we propose the notion of a resource bundle-a representative resource usage distribution for a group of nodes with similar resource usage patterns-that employs two complementary techniques to overcome the limitations of existing techniques: resource usage histograms to provide statistical guarantees for resource capacities and clustering-based resource aggregation to achieve scalability. Using trace-driven simulations and data analysis of a month-long PlanetLab trace, we show that resource bundles are able to provide high accuracy for statistical resource discovery, while achieving high scalability. We also show that resource bundles are ideally suited for identifying group-level characteristics (e.g., hot spots, total group capacity). To automatically parameterize the bundling algorithm, we present an adaptive algorithm that can detect online fluctuations in resource heterogeneity.
    IEEE Transactions on Parallel and Distributed Systems 09/2010; · 1.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Virtualization is being widely used in large-scale computing environments, such as clouds, data centers, and grids, to provide application portability and facilitate resource multiplexing while retaining application isolation. In many exist- ing virtualized platforms, it has been found that the network bandwidth often becomes the bottleneck resource due to the hierarchical topology of the underlying network, causing both high network contention and reduced performance for com- munication and data-intensive applications. In this paper, we present a decentralized affin ity-aware migration technique that incorporates heterogeneity and dynamism in network topology and job communication patterns to allocate virtual machines on the available physical resources. Our technique monitors network affin ity between pairs of VMs and uses a distributed bartering algorithm, coupled with migration, to dynamically adjust VM placement such that communication overhead is minimized. Our experimental results running the Intel MPI benchmark and a scientifi c application on an 8-node Xen cluster show that we can get up to 42% improvement in the runtime of the application over a no-migration technique, while achieving up to 85% reduction in network communication cost. In addition, our technique is able to adjust to dynamic variations in communication patterns and provides both good performance and low network contention with minimal overhead.
    39th International Conference on Parallel Processing, ICPP 2010, San Diego, California, USA, 13-16 September 2010; 01/2010
  • Source
    David Boutcher, Abhishek Chandra
    [Show abstract] [Hide abstract]
    ABSTRACT: We examine whether traditional disk I/O scheduling still provides benefits in a layered system consisting of virtualized operating systems and underlying virtual machine monitor. We demonstrate that choosing the appropriate scheduling algorithm in guest operating systems provides performance benefits, while scheduling in the virtual machine monitor has no measurable advantage. We propose future areas for investigation, including schedulers optimized for running in a virtual machine, for running in a virtual machine monitor, and layered schedulers optimizing both application level access and the underlying storage technology.
    Operating Systems Review. 01/2010; 44:20-24.
  • Source
    Michael Cardosa, Abhishek Chandra
    [Show abstract] [Hide abstract]
    ABSTRACT: Resource discovery enables applications deployed in heterogeneous large-scale distributed systems to find resources that meet QoS requirements. In particular, most applications need resource requirements to be satisfied simultaneously for multiple resources (such as CPU, memory and network bandwidth). Due to dynamism in many large-scale systems, providing statistical guarantees on such requirements is important to avoid application failures and overheads. However, existing techniques either provide guarantees only for individual resources, or take a static or memoryless approach along multiple dimensions. We present HiDRA, a scalable resource discovery technique providing statistical guarantees for resource requirements spanning multiple dimensions simultaneously. Through trace analysis and a 307-node PlanetLab implementation, we show that HiDRA, while using over 1,400 times less data, performs nearly as well as a fully-informed algorithm, showing better precision and having recall within 3%. We demonstrate that HiDRA is a feasible, low-overhead approach to statistical resource discovery in a distributed system.
    Quality of Service, 2009. IWQoS. 17th International Workshop on; 08/2009
  • Source
    Abhishek Chandra, Jon Weissman
    [Show abstract] [Hide abstract]
    ABSTRACT: Current cloud services are deployed on well-provisioned and centrally controlled infrastructures. However, there are several classes of services for which the current cloud model may not fit well: some do not need strong performance guarantees, the pricing may be too expensive for some, and some may be constrained by the data movement costs to the cloud. To satisfy the requirements of such services, we propose the idea of using distributed voluntary resources—those donated by end-user hosts—to form nebulas: more dispersed, less-managed clouds. We first discuss the requirements of cloud services and the challenges in meeting these re-quirements in such voluntary clouds. We then present some possible solutions to these challenges and also discuss opportunities for further improvements to make nebulas a viable cloud paradigm.
    06/2009;
  • Conference Paper: Virtual putty
    Jason Sonnek, Abhishek Chandra
    [Show abstract] [Hide abstract]
    ABSTRACT: Virtualization is a key technology underlying cloud computing platforms, where applications encapsulated within virtual machines are dynamically mapped onto a pool of physical servers. In this paper, we argue that cloud providers can significantly lower operational costs, and improve hosted application performance, by accounting for affinities and conflicts between co-placed virtual machines. We show how these affinities can be inferred using location-independent VM characterizations called virtual footprints, and then show how these virtual footprints can be used to reshape the physical footprint of a VM--its physical resource consumption--to achieve higher VM consolidation and application performance in a cloud environment. We also identify three general principles for minimizing a virtual machine's physical footprint, and discuss challenges in applying these principles in practice.
    Proceedings of the 2009 conference on Hot topics in cloud computing; 06/2009
  • Source
    Jinoh Kim, Abhishek Chandra, Jon B. Weissman
    [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. We introduce the notion of accessibility to capture both availability and performance. An increasing number of data-intensive applications require not only considerations of node computation power but also accessibility for adequate job allocations. For instance, selecting a node with intolerably slow connections can offset any benefit to running on a fast node. In this paper, we present accessibility-aware resource selection techniques by which it is possible to choose nodes that will have efficient data access to remote data sources. We show that the local data access observations collected from a node's neighbors are sufficient to characterize accessibility for that node. By conducting trace-based, synthetic experiments on PlanetLab, we show that the resource selection heuristics guided by this principle significantly outperform conventional techniques such as latency-based or random allocations. The suggested techniques are also shown to be stable even under churn despite the loss of prior observations.
    IEEE Transactions on Parallel and Distributed Systems 01/2009; 20:788-801. · 1.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.
    16th International Conference on High Performance Computing, HiPC 2009, December 16-19, 2009, Kochi, India, Proceedings; 01/2009
  • Source
    J Kim, A Chandra, J.B. Weissman
    [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale distributed systems provide an attractive scalable infrastructure for network applications. However,the loosely-coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. We introduce the notion of accessibility to capture both availability and performance. An increasing number of data intensive applications require not only considerations of node computation power but also accessibility for adequate job allocations. For instance, selecting a node with intolerably slow connections can offset any benefit to running on a fast node. In this paper, we present accessibility-aware resource selection techniques by which it is possible to choose nodes that will have efficient data access to remote data sources. We show that the local data access observations collected from a node's neighbors are sufficient to characterize accessibility for that node. We then present resource selection heuristics guided by this principle, and show that they significantly out perform standard techniques. The suggested techniques are also shown to be stable even under churn despite the loss of prior observations.
    Distributed Computing Systems, 2008. ICDCS '08. The 28th International Conference on; 07/2008
  • Source
    M. Cardosa, A. Chandra
    [Show abstract] [Hide abstract]
    ABSTRACT: Resource discovery is an important process for finding suitable nodes that satisfy application requirements in large loosely-coupled distributed systems. Besides inter-node heterogeneity, many of these systems also show a high degree of intra-node dynamism, so that selecting nodes based only on their recently observed resource capacities for scalability reasons can lead to poor deployment decisions resulting in application failures or migration overheads. In this paper, we propose the notion of a resource bundle - a representative resource usage distribution for a group of nodes with similar resource usage patterns - that employs two complementary techniques to overcome the limitations of existing techniques: resource usage histograms to provide statistical guarantees for resource capacities, and clustering-based resource aggregation to achieve scalability. Using trace-driven simulations and data analysis of a month-long Planet Lab trace, we show that resource bundles are able to provide high accuracy for statistical resource discovery (up to 56% better precision than using only recent values), while achieving high scalability (up to 55% fewer messages than a non-aggregation algorithm). We also show that resource bundles are ideally suited for identifying group-level characteristics such as finding load hot spots and estimating total group capacity (within 8% of actual values).
    Distributed Computing Systems, 2008. ICDCS '08. The 28th International Conference on; 07/2008

Publication Stats

1k Citations
12.31 Total Impact Points

Institutions

  • 2003–2013
    • University of Minnesota Duluth
      • Department of Computer Science
      Duluth, Minnesota, United States
  • 2007–2011
    • University of Minnesota Twin Cities
      • Department of Computer Science and Engineering
      Minneapolis, MN, United States
  • 2000–2003
    • University of Massachusetts Amherst
      • School of Computer Science
      Amherst Center, MA, United States
  • 2001
    • FX Palo Alto Laboratory
      Palo Alto, California, United States