Seif Haridi

Seif Haridi
KTH Royal Institute of Technology | KTH · Department of Software and Computer systems

Professor of Computer Systems, Departmenr of software and computer systems KTH, Royal Institute of Technology

About

238
Publications
48,756
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,909
Citations
Additional affiliations
January 1997 - present
Swedish Institute of Computer Science
Position
  • Principal Investigator
January 1997 - May 2018
KTH Royal Institute of Technology
Position
  • Professor (Full)

Publications

Publications (238)
Article
Vehicular ad hoc networks (VANETs) have received a great amount of interest, especially in wireless communications technology. In VANETs, vehicles are equipped with various intelligent sensors that can collect real-time data from inside and from surrounding vehicles. These real-time data require powerful computation, processing, and storage. Howeve...
Preprint
Full-text available
Agreement among a set of processes and in the presence of partial failures is one of the fundamental problems of distributed systems. In the most general case, many decisions must be agreed upon over the lifetime of a system with dynamically changing membership. Such a sequence of decisions represents a distributed log, and can form the underlying...
Article
Full-text available
Oz is a programming language designed to support multiple programming paradigms in a clean factored way that is easy to program despite its broad coverage. It started in 1991 as a collaborative effort by the DFKI (Germany) and SICS (Sweden) and led to an influential system, Mozart, that was released in 1999 and widely used in the 2000s for practica...
Article
Full-text available
In recent years, OpenACC has been used in many supercomputers and attracted many non-computer science specialists for parallelizing their programs in different scientific fields, including weather forecasting and simulations. OpenACC is a high-level programming model that supports parallelism and is easy to learn to use by adding high-level directi...
Conference Paper
Contemporary end-to-end data pipelines need to combine many diverse workloads such as machine learning, relational operations, stream dataflows, tensor transformations, and graphs. For each of these workload types, there exists several frontends (e.g., SQL, Beam, Keras) based on different programming languages as well as different runtimes (e.g., S...
Conference Paper
The Hadoop Distributed File System (HDFS) is designed to handle massive amounts of data, preferably stored in very large files. The poor performance of HDFS in managing small files has long been a bane of the Hadoop community. In many production deployments of HDFS, almost 25% of the files are less than 16 KB in size and as much as 42% of all the f...
Conference Paper
Full-text available
The growing popularity of Android and the increasing amount of sensitive data stored in mobile devices have lead to the dissemination of Android ransomware. Ransomware is a class of malware that makes data inaccessible by blocking access to the device or, more frequently, by encrypting the data; to recover the data, the user has to pay a ransom to...
Conference Paper
Message-based programming frameworks facilitate the development and execution of core distributed computing algorithms today. Their twofold aim is to expose a programming model that minimises logical errors incurred during translation from an algorithmic specification to executable program, and also to provide an efficient runtime for event pattern...
Conference Paper
In less than a decade, mobile apps became an integral part of our lives. In several situations it is important to provide assurance that a mobile app is authentic, i.e., that it is indeed the app produced by a certain company. However, this is challenging, as such apps can be repackaged, the user malicious, or the app tampered with by an attacker....
Conference Paper
Full-text available
Android has become the most widely used mobile operating system (OS) in recent years. There is much research on methods for detecting malicious Android applications. Dynamic analysis methods detect such applications by evaluating their behaviour during execution. However, such mechanisms may be ineffective as malware is often able to disable antima...
Article
Full-text available
Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster...
Article
Full-text available
In this paper we present KompicsTesting, a framework for unit testing components in the Kompics component model. Components in Kompics are event-driven entities which communicate asynchronously solely by message passing. Similar to actors in the actor model, they do not share their internal state in message-passing, making them less prone to errors...
Chapter
In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processi...
Conference Paper
Full-text available
Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-p...
Conference Paper
While the algorithms for streaming graph partitioning are proved promising, they fall short of creating timely partitions when applied on large graphs. For example, it takes 415 seconds for a state-of-the-art partitioner to work on a social network graph with 117 millions edges. We introduce an efficient platform for boosting streaming graph partit...
Article
We present the SC-ABD algorithm that implements sequentially consistent distributed shared memory (DSM). The algorithm tolerates that less than half of the processes are faulty (crash-stop). Compared to the multi-writer ABD algorithm, SC-ABD requires one instead of two round-trips of communication to perform a write operation, and an equal number o...
Conference Paper
Distributed systems are becoming an increasingly important part of systems and applications software and it is widely accepted that writing correct distributed systems is challenging. Message-passing concurrency models are the dominant programming paradigm and, even in statically typed languages, programming frameworks typically only have limited t...
Article
Full-text available
Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inhere...
Article
Full-text available
Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distribu...
Conference Paper
We present the SC-ABD algorithm that implements sequentially consistent distributed shared memory (DSM). The algorithm tolerates that less than half of the processes are faulty (crash-stop). Compared to the multi-writer ABD algorithm, SC-ABD requires one instead of two round-trips of communication to perform a write operation, and an equal number o...
Article
Full-text available
Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapsh...
Article
Full-text available
Balanced graph partitioning is an NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems, including the optimal storage of large sets of graph-structured data over several hosts. However, in very large-scale distributed scenarios, state-of-the-art algorithms are not directly applicabl...
Article
Full-text available
In recent years, adaptive HTTP streaming protocols have become the de-facto standard in the industry for the dis- Tribution of live and video-on-demand content over the In- Ternet. This paper presents SmoothCache 2.0, a distributed cache platform for adaptive HTTP live streaming content based on peer-to-peer (P2P) overlays. The contribution of this...
Article
Full-text available
Apache Flink 1 is an open-source system for processing streaming and batch data. Flink is built on the philosophy that many classes of data processing applications, including real-time analytics, continuous data pipelines, historic data processing (batch), and iterative algorithms (machine learning, graph analysis) can be expressed and executed as...
Article
Iterative computations are in the core of large-scale graph processing. In these applications, a set of parameters is continuously refined, until a fixed point is reached. Such fixed point iterations often exhibit non-uniform computational behavior, where changes propagate with different speeds throughout the parameter set, making them active or in...
Conference Paper
Graph processing has become an integral part of big data analytics. With the ever increasing size of the graphs, one needs to partition them into smaller clusters, which can be managed and processed more easily on multiple machines in a distributed fashion. While there exist numerous solutions for edge-cut partitioning of graphs, very little effort...
Conference Paper
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing [14]. This work introduces con...
Conference Paper
Balanced graph partitioning is a well known NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems including the optimal storage of large sets of graph-structured data over several hosts-a key problem in today's Cloud infrastructure. However, in very large-scale distributed scenarios,...
Conference Paper
Distributed key-value stores employed in data centers treat each key-value pair as a shared memory register. For fault-tolerance and performance, each key-value pair is replicated. Various models exist for the consistency of data amongst the replicas. While atomic consistency, also known as linearizability, provides the strongest form of consistenc...
Conference Paper
Monitoring the global state of an overlay network is vital for the self-management of peer-to-peer (P2P) systems. Gossip-based algorithms are a well-known technique that can provide nodes locally with aggregated knowledge about the state of the overlay network. In this paper, we present a gossip-based protocol to estimate the global distribution of...
Conference Paper
View synchrony is a communications paradigm for building reliable distributed systems. Testing a protocol using view synchrony with a simulated implementation of view synchrony allows the tested protocol to be exposed to the full timing range allowed by the view synchrony model. This both reduces the complexity of the test environment and increases...
Conference Paper
Message-passing concurrency (MPC) is increasingly being used to build systems software that scales well on multi-core hardware. Functional programming implementations of MPC, such as Erlang, have also leveraged their stateless nature to build middleware that is not just scalable, but also dynamically reconfigurable. However, many middleware platfor...
Conference Paper
We propose consistent quorums to achieve linearizability in scalable and self-organizing key-value stores based on consistent hashing.
Conference Paper
In the last decade, numerous structured overlay networks were proposed as a scalable infrastructure to build large-scale distributed systems under dynamic environments. These overlays were touted to be fault-tolerant and self-managing, yet, as we show in this paper, they fall short of handling some extreme scenarios they envision. These scenarios i...
Article
Peer-to-peer live media streaming over the Internet is becoming increasingly more popular, though it is still a challenging problem. Nodes should receive the stream with respect to intrinsic timing constraints, while the overlay should adapt to the changes in the network and the nodes should be incentivized to contribute their resources. In this wo...
Conference Paper
Peer-to-peer (P2P) video streaming is an emerging technology that reduces the barrier to stream live events over the Internet. Unfortunately, satisfying soft real-time constraints on the delay between the generation of the stream and its actual delivery to users is still a challenging problem. Bottlenecks in the available upload bandwidth, both at...
Conference Paper
Peer2View is a commercial peer-to-peer live video streaming (P2PLS) system. The novelty of Peer2View is threefold: i) It is the first P2PLS platform to support HTTP as transport protocol for live content, ii) The system supports both single and multi-bitrate streaming modes of operation, and iii) It makes use of an application-layer dynamic congest...
Conference Paper
Structured overlay networks, like any distributed system, use replication to avoid losing data in the presence of failures. In this paper, we discuss the short-comings of existing replication schemes and propose a technique for replication, called ID-Replication. ID-Replication allows different replication degrees for keys in the system, thus allow...
Conference Paper
Full-text available
In this paper, we present SmoothCache, a peer-to-peer live video streaming (P2PLS) system. The novelty of SmoothCache is three-fold: i) It is the first P2PLS system that is built to support the relatively-new approach of using HTTP as the transport protocol for live content, ii) The system supports both single and multi-bitrate streaming modes of o...
Article
Full-text available
The rapid proliferation of social media, online communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, resulting in a dynamic ecosystem of online social networking services, environments, and applications. The proliferation of online social networks (OSNs) has had a profound...
Conference Paper
Full-text available
This paper presents the design and implementation of the Dynamic Transport Library (DTL), a UDP-based reliable transport library, initially designed for - but not limited to - peer-to-peer applications. DTL combines many features not simultaneously offered by any other transport library including: i) Wide scope of congestion control levels starting...
Article
Full-text available
Publish/subscribe communication model has become an indispensable part of the Web 2.0 applications, such as social networks and news syndication. Although there exist a few systems that provide a genuinely scalable service for topic-based publish/subscribe model, the content-based solutions are still suffering from restricted subscription schemes,...
Conference Paper
Full-text available
Peer-to-Peer (P2P) live video streaming over the Internet is becoming increasingly popular, but it is still plagued by problems of high playback latency and intermittent playback streams. This paper presents GLive, a distributed market-based solution that builds a mesh overlay for P2P live streaming. The mesh overlay is constructed such that (i) no...
Conference Paper
Full-text available
Gossip-based peer sampling protocols have been widely used as a building block for many large-scale distributed applications. However, Network Address Translation gateways (NATs) cause most existing gossiping protocols to break down, as nodes cannot establish direct connections to nodes behind NATs (private nodes). In addition, most of the existing...
Article
Full-text available
When evaluating Peer-to-Peer content distribution systems by means of simulation, it is of vital importance to correctly mimic the bandwidth dynamics behaviour of the underlying network. In this paper, we propose a scalable and accurate flow-level network simulation model based on an evolution of the classical progressive filling algorithm which fo...
Article
Full-text available
Peer-to-peer overlay networks are attractive solutions for building Internet-scale publish/subscribe systems. However, scalability comes with a cost: a message published on a certain topic often needs to traverse a large number of uninterested (unsubscribed) nodes before reaching all its subscribers. This might sharply increase resource consumption...
Conference Paper
Full-text available
Peer-to-peer overlay networks are attractive solutions for building Internet-scale publish/subscribe systems. However, scalability comes with a cost: a message published on a certain topic often needs to traverse a large number of uninterested (unsubscribed) nodes before reaching all its subscribers. This might sharply increase resource consumption...
Conference Paper
Full-text available
In this paper we present an exploration of central coordination as a way of managing P2P live streaming overlays. The main point is to show the elements needed to construct a system with that approach. A key element in the feasibility of this approach is a near real-time optimization engine for peer selection. Peer organization in a way that enable...
Article
Full-text available
Scalable Internet services are distributed over mul-tiple nodes, which may fail at any time. Events such as partial failures, software errors, and attacks, as well as churn are increasingly becoming normal events. As a result, the design of Internet services is getting more complex and predicting their behavior is daunting. We address these problem...
Article
Full-text available
In this paper we present what are, in our experience, the best practices in Peer-To-Peer(P2P) application development and how we combined them in a middleware platform called Mes-merizer. We explain how simulation is an integral part of the development process and not just an assessment tool. We then present our component-based event-driven frame-w...
Conference Paper
Full-text available
Live streaming of video content using overlay networks has gained widespread adoption on the Internet. This paper presents Sepidar, a distributed market-based model, that builds and maintains overlay network trees, which are approximately minimal height, for delivering live media as a number of sub streams. A streaming tree is constructed for each...
Conference Paper
Peer-to-peer systems have recently received tremendous amount of popularity in both research and commercial endeavors. This paper argues for the systematic exploration of a hybrid of centralized and peer-to-peer system design. We give an example application of peer-to-peer architecture to an inherently centralized service and show how this applicat...
Chapter
Structured overlay networks form a major class of peer-to-peer systems, which are touted for their abilities to scale, tolerate failures, and self-manage. Any long-lived Internet-scale distributed system is destined to face network partitions. Consequently, the problem of network partitions and mergers is highly related to fault-tolerance and self-...
Conference Paper
Full-text available
This paper presents gradienTv, a distributed, market-based approach to live streaming. In gradienTv, multiple streaming trees are constructed using a market-based approach, such that nodes with increasing upload bandwidth are located closer to the media source at the roots of the trees. Market-based approaches, however, exhibit slow convergence pro...
Conference Paper
Full-text available
This paper deals with solving large instances of the Linear Sum Assignment Problems (LSAPs) under realtime constraints, using Graphical Processing Units (GPUs). The motivating scenario is an industrial application for P2P live streaming that is moderated by a central tracker that is periodically solving LSAP instances to optimize the connectivity o...
Conference Paper
Full-text available
Abstract—Key/value stores which are built on structured overlay networks,often lack support,for atomic,transac- tions and strong data consistency among,replicas. This is unfortunate, because consistency guarantees and transac- tions would,allow a wide range of additional application domains,to benefit from the inherent scalability and fault- tolera...