D. H. J. EpemaDelft University of Technology | TU · Department of Computer Science
D. H. J. Epema
Doctor of Philosophy
About
247
Publications
64,573
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,083
Citations
Additional affiliations
July 2015 - May 2017
March 1984 - present
Publications
Publications (247)
Existing digital identity management systems fail to deliver the desirable properties of control by the users of their own identity data, credibility of disclosed identity data, and network-level anonymity. The recently proposed Self-Sovereign Identity (SSI) approach promises to give users these properties. However, we argue that without addressing...
Digital identity is essential to access services such as: online banking, income tax portals, and online higher education. Digital identity is often outsourced to central digital identity providers, introducing a critical dependency. Self-Sovereign Identity gives citizens the ownership back of their own identity. However, proposed solutions concent...
Cloud schedulers that allocate resources exclusively to single workflows are not work-conserving as they may be forced to leave gaps in their schedules because of the precedence constraints in the workflows. Thus, they may lead to a waste of financial resources. This problem can be mitigated by multiple-workflow schedulers that share the leased clo...
Elasticity is one of the main features of cloud computing allowing customers to scale their resources based on the workload. Many autoscalers have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application based on the workload utilizing cloud elasticity features. However, in p...
Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly important, and also challenging current techniques....
Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adop...
Simplifying the task of resource management and scheduling for customers, while still delivering complex Quality-of-Service (QoS), is key to cloud computing. Many autoscaling policies have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application utilizing cloud elasticity fea...
Online gaming franchises such as World of Tanks, Defense of the Ancients, and StarCraft have attracted hundreds of millions of users who, apart from playing the game, also socialize with each other through gaming and viewing gamecasts. As a form of User Generated Content (UGC), gamecasts play an important role in user entertainment and gamer educat...
The Dutch Advanced School for Computing and Imaging has built five generations of a 200-node distributed system over nearly two decades while remaining aligned with the shifting computer science research agenda. The system has supported years of award-winning research, underlining the benefits of investing in a smaller-scale, tailored design.
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With su...
In recent years, many distributed graph-processing systems have been designed and developed to analyze large-scale graphs. For all distributed graph-processing systems, partitioning graphs is a key part of processing and an important aspect to achieve good processing performance. To keep low the overhead of partitioning graphs, even when processing...
Multiplayer Online Games (MOGs) like Defense of the Ancients and StarCraft II have attracted hundreds of millions of users who communicate, interact, and socialize with each other through gaming. In MOGs, rich social relationships emerge and can be used to improve gaming services such as match recommendation and game population retention, which are...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is that small jobs with processing requirements counted in the minutes may suffer from the presence of huge jobs requiring hours or days of compute time, leading to a job slowdown distribution that is very variable and that is uneven across jobs of differ...
In many aspects of human activity, there has been a continuous struggle between the forces of centralization and decentralization. Computing exhibits the same phenomenon; we have gone from mainframes to PCs and local networks in the past, and over the last decade we have seen a centralization and consolidation of services and applications in data c...
Although Multi-Avatar Distributed Virtual Environments (MAVEs) such as Real-Time Strategy (RTS) games entertain daily hundreds of millions of online players, their current designs do not scale. For example, even popular RTS games such as the StarCraft series support in a single game instance only up to 16 players and only a few hundreds of avatars...
Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the make span of single workflows with known task runtim...
Many fields of modern science require huge amounts of computation, and workflows are a very popular tool in e-Science since they allow to organize many small, simple tasks to solve big problems. They are used in astronomy, bioinformatics, machine learning, social network analysis, physics, and many other branches of science. Workflows are notorious...
Graph processing is increasingly used in knowledge economies and in science, in advanced marketing, social networking, bioinformatics, etc. A number of graph-processing systems, including the GPU-enabled Medusa and Totem, have been developed recently. Understanding their performance is key to system selection, tuning, and improvement. Previous perf...
Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have...
User interactions are indispensable for any online network to thrive, especially for BitTorrent-like and Web real-time communication-based distributed online networks that rely on users' collective contributions instead of the help of central servers. User interactions provide fine-grained information for many applications, such as security enhance...
Accounting mechanisms based on credit are used in peer-to-peer systems to track the contribution of peers to the community for the purpose of deterring freeriding and rewarding good behavior. Most often, peers earn credit for uploading files, but other activities might be rewarded in the future as well, such as making useful comments or reporting s...
Distributed reputation systems establish trust among strangers in online communities and provide incentives for users to contribute. In these systems, each user monitors the interactions of others and computes the reputations accordingly. Collecting information for computing the reputations is challenging for the users due to their vulnerability to...
Distributed SQL Query Engines (DSQEs) are increasingly used in a variety of domains, but especially users in small companies with little expertise may face the challenge of selecting an appropriate engine for their specific applications. Although both industry and academia are attempting to come up with high level benchmarks, the performance of DSQ...
Companies, scientific communities, and individual scientists with varying requirements for their compute-intensive applications may want to use public Infrastructure-as-a-Service clouds to increase the capacity of the resources they have access to. To enable such access, resource managers that currently act as gateways to clusters may also do so fo...
Workload modeling and performance evaluation play crucial roles in the study of scheduling algorithms on large-scale parallel and distributed systems. An effective design of a scheduling algorithm for these systems requires experiments with hundreds of simulations to evaluate its performance. Since each simulation needs one workload as input, only...
Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce inst...
Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce inst...
In many clusters and data centers, application frameworks are used that offer programming models such as Dryad and MapReduce, and jobs submitted to the clusters or data centers may be targeted at specific instances of these frameworks, for example because of the presence of certain data. An important question that then arises is how to allocate res...
BitTorrent (BT) plays an important role in Internet content distribution. Because public BTs suffer from the free-rider problem, Darknets are becoming increasingly popular, which use Sharing Ratio Enforcement to increase their efficiency. We crawled and traced 17 Darknets from September 2009 to February 2011, and obtained datasets about over 5 mill...
In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several Map...
The lack of privacy in P2P systems is an inherent characteristic of their design, as users have to expose their content interests. A variety of solutions have been proposed, offering several levels of protection to its users, from privacy to complete anonymity, but always at the cost of performance. However, most P2P users are reluctant to trade pe...
The design and tuning of networked virtual environments (NVEs),
such as World of Warcraft (WoW), require understanding
the in-NVE mobility characteristics of their citizens. Although
many mobility-aware NVE systems already exist,
their validation and further development have been hampered
by the lack of public datasets and of comparison studies
bas...
Technical universities, especially in Europe, are facing an important challenge in attracting more diverse groups of students, and in keeping the students they attract motivated and engaged in the curriculum. We describe our experience with gamification, which we loosely define as a teaching technique that uses social gaming elements to deliver hig...
Peer-to-peer systems are a popular means of transferring files over the Internet, accounting for a third of the upload bandwidth of end users as of 2013. However, recent studies have highlighted that peer-to-peer systems are affected by a lack of balance between the supply and demand of bandwidth. This imbalance stems from the skewed popularity dis...
The commoditization of big data analytics, that is, the deployment, tuning, and future development of big data processing platforms such as MapReduce, relies on a thorough understanding of relevant use cases and workloads. In this work we propose BTWorld, a use case for time-based big data analytics that is representative for processing data collec...
Reputation systems are essential to establish trust and to provide incentives for cooperation among users in decentralized networks. In these systems, the most widely used algorithms for computing reputations are based on random walks. However, in decentralized networks where nodes have only a partial view of the system, random walk-based algorithm...
Deploying applications in leased cloud infrastructure is increas-ingly considered by a variety of business and service integrators. However, the challenge of selecting the leasing strategy — larger or faster instances? on-demand or reserved instances? etc.— and to configure the leasing strat-egy with appropriate scheduling policies is still dauntin...
MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configur...
With the increasing presence, scale, and complexity of distributed systems, resource failures are becoming an important and practical topic of computer science research. While numerous failure models and failure-aware algorithms exist, their comparison has been hampered by the lack of public failure data sets and data processing tools. To facilitat...
Massively Multiplayer Online Games (MMOGs) are an important type of distributed applications and have millions of users. Traditionally, MMOGs are hosted on dedicated clusters, distributed globally. With the advent of cloud computing, MMOGs such as Zynga's are increasingly run on cloud resources, through the use of cloud technology and innovation. M...
Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently a major big data processing paradigm. Nevertheless, existing performance models for MapReduce only comply with specific workloads that process a small fraction of the entire data set, thus failing to assess the capabilities of the MapReduce paradigm...
As peer‐to‐peer (P2P) file‐sharing systems revolve around cooperation, the design of upload incentives has been one of the most important topics in P2P research for more than a decade. Several deployed systems, such as private BitTorrent communities, successfully manage to foster cooperation by banning peers when their sharing ratio becomes too low...
Spotify is a peer-assisted music streaming service that has gained worldwide popularity in the past few years. Until now, little has been published about user behavior in such services. In this paper, we study the user behavior in Spotify by analyzing a massive dataset collected between 2010 and 2011. Firstly, we investigate the system dynamics inc...
P2P communities that use credits to incentivize their members to contribute have emerged over the last few years. In particular, private BitTorrent communities keep track of the total upload and download of each member and impose a minimum threshold for their upload/download ratio, which is known as their sharing ratio. It has been shown that these...
The advent of Cloud computing as a new model of service provisioning in distributed systems encourages researchers to investigate its benefits and drawbacks on executing scientific applications such as workflows. One of the most challenging problems in Clouds is workflow scheduling, i.e., the problem of satisfying the QoS requirements of the user a...
Many private BitTorrent communities employ Sharing Ratio Enforcement (SRE) schemes to incentivize users to contribute. It has been demonstrated that communities that adopt SRE are greatly oversupplied, i.e., they have much higher seeder-to-leecher ratios than communities in which SRE is not employed. Most previous studies focus on showing the posit...
Reputation mechanisms are widely used in online networks to rank users or products, but despite their importance, very few studies have been done or published on their real behavior. In this paper, we study an Internet-deployed distributed reputation mechanism called BarterCast that is specifically designed for peer-to-peer file-sharing systems. Th...
Infrastructure-as-a-Service (IaaS) cloud computing is an emerging commercial infrastructure paradigm under which clients (users) can lease resources when and for how long needed, under a cost model that reflects the actual usage of resources by the client. For IaaS clouds to become mainstream technology and for current cost models to become more cl...
State-of-the-art MapReduce frameworks such as Hadoop can easily scale up to thousands of machines and to large numbers of users. Nevertheless, some users may require isolated environments to develop their applications and to process their data, which calls for multiple deployments of MR clusters within the same physical infrastructure. In this pape...
In online reputation mechanisms, providing the system participants (peers) with the appropriate information on previous interactions is crucial for accurate reputation evaluations. A naive way of doing so is to provide all peers with all information, regardless of whether they need it or not, which may be very costly and not scalable. In this paper...
Video distribution is nowadays the dominant source of Internet traffic, and recent studies show that it is expected to reach 90% of the global consumer traffic by the end of 2015. Peer-to-peer assisted solutions have been adopted by many content providers with the aim of improving the scalability and reliability of their distribution network. While...
Recently, utility Grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with service providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility Grids is workflow scheduling,...
Due to the possibility of cheap identity creation, decentralized online reputation mechanisms are susceptible to sybil attacks. Barter Cast is a reputation mechanism used in the Internet-deployed Tribler file-sharing client. In this paper we study the opportunities for sybil attacks in Barter Cast and we devise a method for making Barter Cast sybil...
In decentralized interaction-based reputation systems, nodes store information about the past interactions of other nodes. Based on this information, they compute reputations in order to take decisions about future interactions. Computing the reputations with the complete history of interactions is inefficient due to its resource requirements. Furt...
Future mobile multimedia systems will have wearable computing devices as their front ends, supported by database servers, I/O servers, and compute servers over a backbone network. Multimedia applications on such systems are demanding in terms of network and compute resources, and have stringent Quality of Service (QoS) requirements. Providing QoS h...
Many peer-to-peer communities, including private BitTorrent communities that serve hundreds of thousands of users, utilize credit-based or sharing ratio enforcement schemes to incentivize their members to contribute. In this paper, we analyze the performance of such communities from both the system-level and the user-level perspectives. We show tha...
The volume of Internet video is growing, and is expected to exceed 57 percent of global consumer Internet traffic by 2014. Peer-to-Peer technology can help delivering this massive volume of traffic in a cost-efficient, scalable, and reliable manner. However, single bit rate streaming is not sufficient given today's device and network connection div...
Flashcrowds - sudden surges of user arrivals - do occur in BitTorrent, and they can lead to severe service deprivation. However, very little is known about their occurrence patterns and their characteristics in real-world deployments, and many basic questions about BitTorrent flashcrowds, such as How often do they occur? and How long do they last?,...
Many private BitTorrent communities employ Sharing Ratio Enforcement (SRE) schemes to incentivize users to contribute their upload resources. It has been demonstrated that communities that use SRE are greatly oversupplied, i.e., they have much higher seeder-to-leecher ratios than communities in which SRE is not employed. The first order effect of o...
A considerable body of research shows that Bit-Torrent provides very efficient resource allocation inside single swarms. Many BitTorrent clients also allow users to participate in multiple swarms simultaneously, and implement inter-swarm resource-allocation mechanisms that are used by millions of people. However, resource allocation across multiple...
Multi-cluster grids are widely employed to execute workloads consisting of compute- and data-intensive applications in both research and production environments. Such workloads, especially when they are bursty, may stress shared system resources, to the point where overload conditions occur. Overloads can severely degrade the system performance and...
Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike. Through the use of virtualization and resource time sharing, clouds serve with a single set of physical resources a large user base with different needs. Thus, clouds...
Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their...
Interactive perception applications, such as gesture recognition and vision-based user interfaces, process high-data rate streams with compute intensive computer vision and machine learning algorithms. These applications can be represented as data flow graphs comprising several processing stages. Such applications require low latency to be interact...
During the last decade of P2P research a lot of attention has been given to incentive mechanisms. While centralized incentive mechanisms are straightforward in their design, a long term challenge has been to create a decentralized incentive mechanism that can be used to effectively induce cooperation and reduce freeriding. While there have been man...
In the Bartercast reputation mechanism of the BitTorrent-based P2P client Tribler, peers compute local, subjec- tive reputations of other peers by applying a flow-based algorithm to a locally maintained Bartercast graph with peers as nodes and bandwidth contributions as edges. We have previously shown that the computed reputations are more accurate...
In the mid 1990s, the grid computing community promised the "compute power grid," a utility computing infrastructure for scientists and engineers. Since then, a variety of grids have been built worldwide, for academic purposes, specific application domains, and general production work. Understanding grid workloads is important for the design and tu...
Enhancing reciprocity has been one of the primary motivations for the design of incentive policies in BitTorrent-like P2P systems. Reciprocity implies that peers need to contribute their bandwidth to other peers if they want to receive bandwidth in return. However, the over-provisioning that characterizes today's BitTorrent communities and the deve...
In this paper, we aim to provide insight into the limitations of the effectiveness of decentralized reputation-based incentive mechanisms in general. We focus on systems in which peers can be identified as freeriders based on reputation information, and analyze the maximal performance penalty that can be obtained given a certain consistency of the...
These 2 keynote speeches discuss the following: P2P File Sharing: Past! - Present - Future?; and Information Society, Security and Ethics.
Peer-to-Peer (P2P) systems have gained a phenomenal popularity in the past few years; BitTorrent serves daily tens of millions of people and generates an important fraction of the Internet traffic. Measurement data collected from real P2P systems are fundamental for gaining solid knowledge of the usage patterns and the characteristics of these syst...
Recently, utility grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility grids is workflow scheduling, i.e., th...
It is not uncommon that grid users observe highly variable performance when they submit similar workloads at different times. From the users' point of view, such inconsistent performance is undesirable, and it leads to user dissatisfaction and confusion. We tackle this performance inconsistency problem using overprovisioning which is increasing the...
Betweenness Centrality (BC) is a powerful metric for identifying central nodes in complex network analysis, but its computation in large and dynamic systems is costly. Most of the previous approximations for computing BC are either restricted to only one type of networks, or are too computationally inefficient to be applied to large or dynamically...
The analysis and modeling of the failures bound to occur in today's large-scale production systems is invaluable in providing the understanding needed to make these systems fault-tolerant yet efficient. Many previous studies have modeled failures without taking into account the time-varying behavior of failures, under the assumption that failures a...