D. H. J. Epema

D. H. J. Epema
Delft University of Technology | TU · Department of Computer Science

Doctor of Philosophy

About

247
Publications
64,573
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,083
Citations
Additional affiliations
July 2015 - May 2017
Delft University of Technology
Position
  • Professor
March 1984 - present
Delft University of Technology
Position
  • Professor (Associate)

Publications

Publications (247)
Conference Paper
Full-text available
Existing digital identity management systems fail to deliver the desirable properties of control by the users of their own identity data, credibility of disclosed identity data, and network-level anonymity. The recently proposed Self-Sovereign Identity (SSI) approach promises to give users these properties. However, we argue that without addressing...
Preprint
Full-text available
Digital identity is essential to access services such as: online banking, income tax portals, and online higher education. Digital identity is often outsourced to central digital identity providers, introducing a critical dependency. Self-Sovereign Identity gives citizens the ownership back of their own identity. However, proposed solutions concent...
Article
Full-text available
Cloud schedulers that allocate resources exclusively to single workflows are not work-conserving as they may be forced to leave gaps in their schedules because of the precedence constraints in the workflows. Thus, they may lead to a waste of financial resources. This problem can be mitigated by multiple-workflow schedulers that share the leased clo...
Article
Full-text available
Elasticity is one of the main features of cloud computing allowing customers to scale their resources based on the workload. Many autoscalers have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application based on the workload utilizing cloud elasticity features. However, in p...
Conference Paper
Full-text available
Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly important, and also challenging current techniques....
Conference Paper
Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adop...
Conference Paper
Simplifying the task of resource management and scheduling for customers, while still delivering complex Quality-of-Service (QoS), is key to cloud computing. Many autoscaling policies have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application utilizing cloud elasticity fea...
Article
Full-text available
Online gaming franchises such as World of Tanks, Defense of the Ancients, and StarCraft have attracted hundreds of millions of users who, apart from playing the game, also socialize with each other through gaming and viewing gamecasts. As a form of User Generated Content (UGC), gamecasts play an important role in user entertainment and gamer educat...
Article
Full-text available
The Dutch Advanced School for Computing and Imaging has built five generations of a 200-node distributed system over nearly two decades while remaining aligned with the shifting computer science research agenda. The system has supported years of award-winning research, underlining the benefits of investing in a smaller-scale, tailored design.
Conference Paper
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With su...
Article
In recent years, many distributed graph-processing systems have been designed and developed to analyze large-scale graphs. For all distributed graph-processing systems, partitioning graphs is a key part of processing and an important aspect to achieve good processing performance. To keep low the overhead of partitioning graphs, even when processing...
Article
Full-text available
Multiplayer Online Games (MOGs) like Defense of the Ancients and StarCraft II have attracted hundreds of millions of users who communicate, interact, and socialize with each other through gaming. In MOGs, rich social relationships emerge and can be used to improve gaming services such as match recommendation and game population retention, which are...
Conference Paper
Full-text available
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is that small jobs with processing requirements counted in the minutes may suffer from the presence of huge jobs requiring hours or days of compute time, leading to a job slowdown distribution that is very variable and that is uneven across jobs of differ...
Article
Full-text available
In many aspects of human activity, there has been a continuous struggle between the forces of centralization and decentralization. Computing exhibits the same phenomenon; we have gone from mainframes to PCs and local networks in the past, and over the last decade we have seen a centralization and consolidation of services and applications in data c...
Article
Full-text available
Although Multi-Avatar Distributed Virtual Environments (MAVEs) such as Real-Time Strategy (RTS) games entertain daily hundreds of millions of online players, their current designs do not scale. For example, even popular RTS games such as the StarCraft series support in a single game instance only up to 16 players and only a few hundreds of avatars...
Conference Paper
Full-text available
Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the make span of single workflows with known task runtim...
Article
Many fields of modern science require huge amounts of computation, and workflows are a very popular tool in e-Science since they allow to organize many small, simple tasks to solve big problems. They are used in astronomy, bioinformatics, machine learning, social network analysis, physics, and many other branches of science. Workflows are notorious...
Article
Graph processing is increasingly used in knowledge economies and in science, in advanced marketing, social networking, bioinformatics, etc. A number of graph-processing systems, including the GPU-enabled Medusa and Totem, have been developed recently. Understanding their performance is key to system selection, tuning, and improvement. Previous perf...
Article
Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have...
Article
User interactions are indispensable for any online network to thrive, especially for BitTorrent-like and Web real-time communication-based distributed online networks that rely on users' collective contributions instead of the help of central servers. User interactions provide fine-grained information for many applications, such as security enhance...
Article
Accounting mechanisms based on credit are used in peer-to-peer systems to track the contribution of peers to the community for the purpose of deterring freeriding and rewarding good behavior. Most often, peers earn credit for uploading files, but other activities might be rewarded in the future as well, such as making useful comments or reporting s...
Conference Paper
Full-text available
Distributed reputation systems establish trust among strangers in online communities and provide incentives for users to contribute. In these systems, each user monitors the interactions of others and computes the reputations accordingly. Collecting information for computing the reputations is challenging for the users due to their vulnerability to...
Article
Distributed SQL Query Engines (DSQEs) are increasingly used in a variety of domains, but especially users in small companies with little expertise may face the challenge of selecting an appropriate engine for their specific applications. Although both industry and academia are attempting to come up with high level benchmarks, the performance of DSQ...
Conference Paper
Companies, scientific communities, and individual scientists with varying requirements for their compute-intensive applications may want to use public Infrastructure-as-a-Service clouds to increase the capacity of the resources they have access to. To enable such access, resource managers that currently act as gateways to clusters may also do so fo...
Article
Workload modeling and performance evaluation play crucial roles in the study of scheduling algorithms on large-scale parallel and distributed systems. An effective design of a scheduling algorithm for these systems requires experiments with hundreds of simulations to evaluate its performance. Since each simulation needs one workload as input, only...
Conference Paper
Full-text available
Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce inst...
Conference Paper
Full-text available
Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce inst...
Conference Paper
In many clusters and data centers, application frameworks are used that offer programming models such as Dryad and MapReduce, and jobs submitted to the clusters or data centers may be targeted at specific instances of these frameworks, for example because of the presence of certain data. An important question that then arises is how to allocate res...
Article
BitTorrent (BT) plays an important role in Internet content distribution. Because public BTs suffer from the free-rider problem, Darknets are becoming increasingly popular, which use Sharing Ratio Enforcement to increase their efficiency. We crawled and traced 17 Darknets from September 2009 to February 2011, and obtained datasets about over 5 mill...
Conference Paper
In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several Map...
Article
The lack of privacy in P2P systems is an inherent characteristic of their design, as users have to expose their content interests. A variety of solutions have been proposed, offering several levels of protection to its users, from privacy to complete anonymity, but always at the cost of performance. However, most P2P users are reluctant to trade pe...
Conference Paper
Full-text available
The design and tuning of networked virtual environments (NVEs), such as World of Warcraft (WoW), require understanding the in-NVE mobility characteristics of their citizens. Although many mobility-aware NVE systems already exist, their validation and further development have been hampered by the lack of public datasets and of comparison studies bas...
Conference Paper
Full-text available
Technical universities, especially in Europe, are facing an important challenge in attracting more diverse groups of students, and in keeping the students they attract motivated and engaged in the curriculum. We describe our experience with gamification, which we loosely define as a teaching technique that uses social gaming elements to deliver hig...
Conference Paper
Peer-to-peer systems are a popular means of transferring files over the Internet, accounting for a third of the upload bandwidth of end users as of 2013. However, recent studies have highlighted that peer-to-peer systems are affected by a lack of balance between the supply and demand of bandwidth. This imbalance stems from the skewed popularity dis...
Conference Paper
The commoditization of big data analytics, that is, the deployment, tuning, and future development of big data processing platforms such as MapReduce, relies on a thorough understanding of relevant use cases and workloads. In this work we propose BTWorld, a use case for time-based big data analytics that is representative for processing data collec...
Conference Paper
Full-text available
Reputation systems are essential to establish trust and to provide incentives for cooperation among users in decentralized networks. In these systems, the most widely used algorithms for computing reputations are based on random walks. However, in decentralized networks where nodes have only a partial view of the system, random walk-based algorithm...
Conference Paper
Full-text available
Deploying applications in leased cloud infrastructure is increas-ingly considered by a variety of business and service integrators. However, the challenge of selecting the leasing strategy — larger or faster instances? on-demand or reserved instances? etc.— and to configure the leasing strat-egy with appropriate scheduling policies is still dauntin...
Conference Paper
Full-text available
MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configur...
Article
With the increasing presence, scale, and complexity of distributed systems, resource failures are becoming an important and practical topic of computer science research. While numerous failure models and failure-aware algorithms exist, their comparison has been hampered by the lack of public failure data sets and data processing tools. To facilitat...
Conference Paper
Full-text available
Massively Multiplayer Online Games (MMOGs) are an important type of distributed applications and have millions of users. Traditionally, MMOGs are hosted on dedicated clusters, distributed globally. With the advent of cloud computing, MMOGs such as Zynga's are increasingly run on cloud resources, through the use of cloud technology and innovation. M...
Conference Paper
Full-text available
Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently a major big data processing paradigm. Nevertheless, existing performance models for MapReduce only comply with specific workloads that process a small fraction of the entire data set, thus failing to assess the capabilities of the MapReduce paradigm...
Article
As peer‐to‐peer (P2P) file‐sharing systems revolve around cooperation, the design of upload incentives has been one of the most important topics in P2P research for more than a decade. Several deployed systems, such as private BitTorrent communities, successfully manage to foster cooperation by banning peers when their sharing ratio becomes too low...
Conference Paper
Full-text available
Spotify is a peer-assisted music streaming service that has gained worldwide popularity in the past few years. Until now, little has been published about user behavior in such services. In this paper, we study the user behavior in Spotify by analyzing a massive dataset collected between 2010 and 2011. Firstly, we investigate the system dynamics inc...
Conference Paper
P2P communities that use credits to incentivize their members to contribute have emerged over the last few years. In particular, private BitTorrent communities keep track of the total upload and download of each member and impose a minimum threshold for their upload/download ratio, which is known as their sharing ratio. It has been shown that these...
Article
The advent of Cloud computing as a new model of service provisioning in distributed systems encourages researchers to investigate its benefits and drawbacks on executing scientific applications such as workflows. One of the most challenging problems in Clouds is workflow scheduling, i.e., the problem of satisfying the QoS requirements of the user a...
Conference Paper
Many private BitTorrent communities employ Sharing Ratio Enforcement (SRE) schemes to incentivize users to contribute. It has been demonstrated that communities that adopt SRE are greatly oversupplied, i.e., they have much higher seeder-to-leecher ratios than communities in which SRE is not employed. Most previous studies focus on showing the posit...
Conference Paper
Reputation mechanisms are widely used in online networks to rank users or products, but despite their importance, very few studies have been done or published on their real behavior. In this paper, we study an Internet-deployed distributed reputation mechanism called BarterCast that is specifically designed for peer-to-peer file-sharing systems. Th...
Conference Paper
Full-text available
Infrastructure-as-a-Service (IaaS) cloud computing is an emerging commercial infrastructure paradigm under which clients (users) can lease resources when and for how long needed, under a cost model that reflects the actual usage of resources by the client. For IaaS clouds to become mainstream technology and for current cost models to become more cl...
Conference Paper
Full-text available
State-of-the-art MapReduce frameworks such as Hadoop can easily scale up to thousands of machines and to large numbers of users. Nevertheless, some users may require isolated environments to develop their applications and to process their data, which calls for multiple deployments of MR clusters within the same physical infrastructure. In this pape...
Conference Paper
In online reputation mechanisms, providing the system participants (peers) with the appropriate information on previous interactions is crucial for accurate reputation evaluations. A naive way of doing so is to provide all peers with all information, regardless of whether they need it or not, which may be very costly and not scalable. In this paper...
Conference Paper
Video distribution is nowadays the dominant source of Internet traffic, and recent studies show that it is expected to reach 90% of the global consumer traffic by the end of 2015. Peer-to-peer assisted solutions have been adopted by many content providers with the aim of improving the scalability and reliability of their distribution network. While...
Article
Full-text available
Recently, utility Grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with service providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility Grids is workflow scheduling,...
Conference Paper
Due to the possibility of cheap identity creation, decentralized online reputation mechanisms are susceptible to sybil attacks. Barter Cast is a reputation mechanism used in the Internet-deployed Tribler file-sharing client. In this paper we study the opportunities for sybil attacks in Barter Cast and we devise a method for making Barter Cast sybil...
Conference Paper
Full-text available
In decentralized interaction-based reputation systems, nodes store information about the past interactions of other nodes. Based on this information, they compute reputations in order to take decisions about future interactions. Computing the reputations with the complete history of interactions is inefficient due to its resource requirements. Furt...
Article
Future mobile multimedia systems will have wearable computing devices as their front ends, supported by database servers, I/O servers, and compute servers over a backbone network. Multimedia applications on such systems are demanding in terms of network and compute resources, and have stringent Quality of Service (QoS) requirements. Providing QoS h...
Article
Full-text available
Many peer-to-peer communities, including private BitTorrent communities that serve hundreds of thousands of users, utilize credit-based or sharing ratio enforcement schemes to incentivize their members to contribute. In this paper, we analyze the performance of such communities from both the system-level and the user-level perspectives. We show tha...
Conference Paper
Full-text available
The volume of Internet video is growing, and is expected to exceed 57 percent of global consumer Internet traffic by 2014. Peer-to-Peer technology can help delivering this massive volume of traffic in a cost-efficient, scalable, and reliable manner. However, single bit rate streaming is not sufficient given today's device and network connection div...
Conference Paper
Full-text available
Flashcrowds - sudden surges of user arrivals - do occur in BitTorrent, and they can lead to severe service deprivation. However, very little is known about their occurrence patterns and their characteristics in real-world deployments, and many basic questions about BitTorrent flashcrowds, such as How often do they occur? and How long do they last?,...
Conference Paper
Full-text available
Many private BitTorrent communities employ Sharing Ratio Enforcement (SRE) schemes to incentivize users to contribute their upload resources. It has been demonstrated that communities that use SRE are greatly oversupplied, i.e., they have much higher seeder-to-leecher ratios than communities in which SRE is not employed. The first order effect of o...
Conference Paper
Full-text available
A considerable body of research shows that Bit-Torrent provides very efficient resource allocation inside single swarms. Many BitTorrent clients also allow users to participate in multiple swarms simultaneously, and implement inter-swarm resource-allocation mechanisms that are used by millions of people. However, resource allocation across multiple...
Conference Paper
Full-text available
Multi-cluster grids are widely employed to execute workloads consisting of compute- and data-intensive applications in both research and production environments. Such workloads, especially when they are bursty, may stress shared system resources, to the point where overload conditions occur. Overloads can severely degrade the system performance and...
Article
Full-text available
Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike. Through the use of virtualization and resource time sharing, clouds serve with a single set of physical resources a large user base with different needs. Thus, clouds...
Conference Paper
Full-text available
Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their...
Conference Paper
Full-text available
Interactive perception applications, such as gesture recognition and vision-based user interfaces, process high-data rate streams with compute intensive computer vision and machine learning algorithms. These applications can be represented as data flow graphs comprising several processing stages. Such applications require low latency to be interact...
Article
Full-text available
During the last decade of P2P research a lot of attention has been given to incentive mechanisms. While centralized incentive mechanisms are straightforward in their design, a long term challenge has been to create a decentralized incentive mechanism that can be used to effectively induce cooperation and reduce freeriding. While there have been man...
Conference Paper
Full-text available
In the Bartercast reputation mechanism of the BitTorrent-based P2P client Tribler, peers compute local, subjec- tive reputations of other peers by applying a flow-based algorithm to a locally maintained Bartercast graph with peers as nodes and bandwidth contributions as edges. We have previously shown that the computed reputations are more accurate...
Article
Full-text available
In the mid 1990s, the grid computing community promised the "compute power grid," a utility computing infrastructure for scientists and engineers. Since then, a variety of grids have been built worldwide, for academic purposes, specific application domains, and general production work. Understanding grid workloads is important for the design and tu...
Conference Paper
Full-text available
Enhancing reciprocity has been one of the primary motivations for the design of incentive policies in BitTorrent-like P2P systems. Reciprocity implies that peers need to contribute their bandwidth to other peers if they want to receive bandwidth in return. However, the over-provisioning that characterizes today's BitTorrent communities and the deve...
Conference Paper
In this paper, we aim to provide insight into the limitations of the effectiveness of decentralized reputation-based incentive mechanisms in general. We focus on systems in which peers can be identified as freeriders based on reputation information, and analyze the maximal performance penalty that can be obtained given a certain consistency of the...
Conference Paper
These 2 keynote speeches discuss the following: P2P File Sharing: Past! - Present - Future?; and Information Society, Security and Ethics.
Conference Paper
Full-text available
Peer-to-Peer (P2P) systems have gained a phenomenal popularity in the past few years; BitTorrent serves daily tens of millions of people and generates an important fraction of the Internet traffic. Measurement data collected from real P2P systems are fundamental for gaining solid knowledge of the usage patterns and the characteristics of these syst...
Conference Paper
Full-text available
Recently, utility grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility grids is workflow scheduling, i.e., th...
Conference Paper
Full-text available
It is not uncommon that grid users observe highly variable performance when they submit similar workloads at different times. From the users' point of view, such inconsistent performance is undesirable, and it leads to user dissatisfaction and confusion. We tackle this performance inconsistency problem using overprovisioning which is increasing the...
Conference Paper
Full-text available
Betweenness Centrality (BC) is a powerful metric for identifying central nodes in complex network analysis, but its computation in large and dynamic systems is costly. Most of the previous approximations for computing BC are either restricted to only one type of networks, or are too computationally inefficient to be applied to large or dynamically...
Conference Paper
Full-text available
The analysis and modeling of the failures bound to occur in today's large-scale production systems is invaluable in providing the understanding needed to make these systems fault-tolerant yet efficient. Many previous studies have modeled failures without taking into account the time-varying behavior of failures, under the assumption that failures a...