D. Epema

Technische Universiteit Delft, Delft, South Holland, Netherlands

Are you D. Epema?

Claim your profile

Publications (11)2 Total impact

  • Source
    Conference Proceeding: IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
    A. Iosup, R. Prodan, D. Epema
    [show abstract] [hide abstract]
    ABSTRACT: Infrastructure-as-a-Service (IaaS) cloud computing is an emerging commercial infrastructure paradigm under which clients (users) can lease resources when and for how long needed, under a cost model that reflects the actual usage of resources by the client. For IaaS clouds to become mainstream technology and for current cost models to become more client-friendly, benchmarking and comparing the non-functional system properties of various IaaS clouds is important, especially for the cloud users. In this article we focus on the IaaS cloud-specific elements of benchmarking, from a user's perspective. We propose a generic approach for IaaS cloud benchmarking, discuss numerous challenges in developing this approach, and summarize our experience towards benchmarking IaaS clouds. We argue for an experimental approach that requires, among others, new techniques for experiment compression, new benchmarking methods that go beyond blackbox and isolated-user testing, new benchmark designs that are domain-specific, and new metrics for elasticity and variability.
    5th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2012), held in conjunction with SC, Salt Lake City, Utah, USA, Nov 2012., Salt Lake City, Utah, USA; 11/2012
  • Source
    Conference Proceeding: Identifying, analyzing, and modeling flashcrowds in BitTorrent
    [show abstract] [hide abstract]
    ABSTRACT: Flashcrowds - sudden surges of user arrivals - do occur in BitTorrent, and they can lead to severe service deprivation. However, very little is known about their occurrence patterns and their characteristics in real-world deployments, and many basic questions about BitTorrent flashcrowds, such as How often do they occur? and How long do they last?, remain unanswered. In this paper, we address these questions by studying three datasets that cover millions of swarms from two of the largest BitTorrent trackers. We first propose a model for BitTorrent flashcrowds and a procedure for identifying, analyzing, and modeling BitTorrent flashcrowds. Then we evaluate quantitatively the impact of flashcrowds on BitTorrent users, and we develop an algorithm that identifies BitTorrent flashcrowds. Finally, we study statistically the properties of BitTorrent flashcrowds identified from our datasets, such as their arrival time, duration, and magnitude, and we investigate the relationship between flashcrowds and swarm growth, and the arrival rate of flashcrowds in BitTorrent trackers. In particular, we find that BitTorrent flashcrowds only occur in very small fractions (0.3-2%) of the swarms but that they can affect over ten million users.
    Peer-to-Peer Computing (P2P), 2011 IEEE International Conference on; 10/2011
  • Conference Proceeding: On the Performance Variability of Production Cloud Services
    A. Iosup, N. Yigitbasi, D. Epema
    [show abstract] [hide abstract]
    ABSTRACT: Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their owners the benefits of an economy of scale and, at the same time, become an alternative for both the industry and the scientific community to self-owned clusters, grids, and parallel production environments. For this potential to become reality, the first generation of commercial clouds need to be proven to be dependable. In this work we analyze the dependability of cloud services. Towards this end, we analyze long-term performance traces from Amazon Web Services and Google App Engine, currently two of the largest commercial clouds in production. We find that the performance of about half of the cloud services we investigate exhibits yearly and daily patterns, but also that most services have periods of especially stable performance. Last, through trace-based simulation we assess the impact of the variability observed for the studied cloud services on three large-scale applications, job execution in scientific computing, virtual goods trading in social networks, and state management in social gaming. We show that the impact of performance variability depends on the application, and give evidence that performance variability can be an important factor in cloud provider selection.
    Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on; 06/2011
  • Source
    Article: Grid Computing Workloads
    A. Iosup, D. Epema
    [show abstract] [hide abstract]
    ABSTRACT: In the mid 1990s, the grid computing community promised the "compute power grid," a utility computing infrastructure for scientists and engineers. Since then, a variety of grids have been built worldwide, for academic purposes, specific application domains, and general production work. Understanding grid workloads is important for the design and tuning of future grid resource managers and applications, especially in the recent wake of commercial grids and clouds. This article presents an overview of the most important characteristics of grid workloads in the past seven years (2003-2010). Although grid user populations range from tens to hundreds of individuals, a few users dominate each grid's workload both in terms of consumed resources and the number of jobs submitted to the system. Real grid workloads include very few parallel jobs but many independent single-machine jobs (tasks) grouped into single "bags of tasks."
    IEEE Internet Computing 05/2011; · 2.00 Impact Factor
  • Source
    Conference Proceeding: 2Fast : Collaborative Downloads in P2P Networks
    [show abstract] [hide abstract]
    ABSTRACT: P2P systems that rely on the voluntary contribution of bandwidth by the individual peers may suffer from free riding. To address this problem, mechanisms enforcing fairness in bandwidth sharing have been designed, usually by limiting the download bandwidth to the available upload bandwidth. As in real environments the latter is much smaller than the former, these mechanisms severely affect the download performance of most peers. In this paper we propose a system called 2Fast, which solves this problem while preserving the fairness of bandwidth sharing. In 2Fast, we form groups of peers that collaborate in downloading a file on behalf of a single group member, which can thus use its full download bandwidth. A peer in our system can use its currently idle bandwidth to help other peers in their ongoing downloads, and get in return help during its own downloads. We assess the performance of 2Fast analytically and experimentally, the latter in both real and simulated environments. We find that in realistic bandwidth limit settings, 2Fast improves the download speed by up to a factor of 3.5 in comparison to state-of-the-art P2P download protocols
    Peer-to-Peer Computing, 2006. P2P 2006. Sixth IEEE International Conference on; 10/2006
  • Source
    Conference Proceeding: Correlating Topology and Path Characteristics of Overlay Networks and the Internet
    [show abstract] [hide abstract]
    ABSTRACT: Real-world IP applications such as Peer-to-Peer filesharing are now able to benefit from network and location awareness. It is therefore crucial to understand the relation between underlay and overlay networks and to characterize the behavior of real users with regard to the Internet. For this purpose, we have designed and implemented MULTIPROBE, a framework for large-scale P2P file-sharing measurements. Using this framework, we have performed measurements of BitTorrent, which is currently the P2P file sharing network with the largest amount of Internet traffic. We analyze and correlate these measurements to provide new insights into the topology, the connectivity, and the path characteristics of the Internet parts underlying P2P networks, as well as to present unique information on the BitTorrent throughput and connectivity.
    Cluster Computing and the Grid Workshops, 2006. Sixth IEEE International Symposium on; 06/2006
  • Source
    Conference Proceeding: GRENCHMARK: A Framework for Analyzing, Testing, and Comparing Grids
    A. Iosup, D. Epema
    [show abstract] [hide abstract]
    ABSTRACT: Grid computing is becoming the natural way to aggregate and share large sets of heterogeneous resources. With the infrastructure becoming ready for the challenge, current grid development and acceptance hinge on proving that grids reliably support real applications, and on creating adequate benchmarks to quantify this support. However, grid applications are just beginning to emerge, and traditional benchmarks have yet to prove representative in grid environments. To address this chicken-and-egg problem, we propose a middle-way approach: create and run synthetic grid workloads comprising applications representative for today’s grids. For this purpose, we have designed and implemented GRENCHMARK, a framework for synthetic workload generation and submission. The framework greatly facilitates synthetic workload modeling, comes with over 35 synthetic and real applications, and is extensible and flexible. We show how the framework can be used for grid system analysis, functionality testing in grid environments, and for comparing different grid settings, and present the results obtained with GRENCHMARK in our multi-cluster grid, the DAS
    Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on; 06/2006
  • Source
    Article: Build-and-test workloads for grid middleware: problem, analysis, and applications
    A. Iosup, D. Epema
    [show abstract] [hide abstract]
    ABSTRACT: The Grid promise is starting to materialize today: largescale multi-site infrastructures have grown to assist the work of scientists from all around the world. This tremendous growth can be sustained and continued only through a higher quality of the middleware, in terms of deployability and of correct functionality. A potential solution to this problem is the adoption of industry practices regarding middleware building and testing. However, it is unclear what good build-and-test environments for grid middleware should look like, and how to use them efficiently. In this work we address both these problems. First, we study the characteristics of the NMI build-and-test environment, which handles millions of testing tasks annually, for major Grid middleware such as Condor, Globus, VDT, and gLite. Through the analysis of a system-wide trace covering the past two years we find the main characteristics of the workload, as well as the performance of the system under load. Second, we propose mechanisms for more efficient test management and operation, and for resource provisioning and evaluation. Notably, we propose a generic test optimization technique that reduces the test time by 95%, while achieving 93% of the maximum accuracy, under real conditions.
    Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2007, Rio de Janeiro, Brazil, May 2007.
  • Source
    Article: Identifying and modeling flashcrowds in global bittorrent systems
  • Source
    Conference Proceeding: The peer-to-peer trace archive: Design and comparative trace analysis
    Proceedings of the ACM CoNEXT Student Workshop;
  • Article: A scalable method for taking detailed and accurate geo* snapshots of large P2P networks

Top Journals

Institutions

  • 2006–2011
    • Technische Universiteit Delft
      • Faculty of Electrical Engineering, Mathematics and Computer Sciences (EEMCS)
      Delft, South Holland, Netherlands