
Hans-arno JacobsenUniversity of Toronto | U of T
Hans-arno Jacobsen
About
250
Publications
32,061
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,760
Citations
Publications
Publications (250)
The growing number of data centers consumes a vast amount of energy for processing. There is a desire to reduce the environmental footprint of the IT industry, and one way to achieve this is to use renewable energy sources. A challenge with using renewable resources is that the energy output is irregular as a consequence of the intermittent nature...
Byzantine fault-tolerant (BFT) consensus algorithms are at the core of providing safety and liveness guarantees for distributed systems that must operate in the presence of arbitrary failures. Recently, numerous new BFT algorithms have been proposed, not least due to the traction blockchain technologies have garnered in search for consensus solutio...
Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low m...
Growing excitement around permissionless blockchains is uncovering its latent scalability concerns. Permissioned blockchains offer high transactional throughput and low latencies while compromising decentralization. In the quest for a decentralized, scalable blockchain fabric, i.e., to offer the scalability of permissioned blockchain in a permissio...
Leader-based consensus protocols must undergo a view-change phase to elect a new leader when the current leader fails. The new leader is often decided upon a candidate server that collects votes from a quorum of servers. However, voting-based election mechanisms intrinsically cause competition in leadership candidacy when each candidate collects on...
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yi...
Transfer Learning is a well-studied concept in machine learning, that relaxes the assumption that training and testing data need to be drawn from the same distribution. Recent success in applying transfer learning in the area of computer vision has motivated research on transfer learning also in context of time series data. This benefits learning i...
In smart grids, the large-scale integration of distributed renewable energy resources has enabled the provisioning of alternative sources of supply. Peer-to-peer (P2P) energy trading among local households is becoming an emerging technique that benefits both energy prosumers and operators. Since conventional energy supply is still needed to help fi...
We present a new approach for designing reliable and scalable overlay networks to support topic-based pub/sub communication. We propose the MinAvg-kTCO problem parameterized by k: use the minimum number of edges to create a k-topic-connected overlay (kTCO) for pub/sub systems, i.e., for each topic, the sub-overlay induced by nodes interested in the...
Current Internet of Things (IoT) infrastructures rely on cloud storage however, relying on a single cloud provider puts limitations on the IoT applications and Service Level Agreement (SLA) requirements. Recently, multiple decentralized storage solutions (e.g., based on blockchains) have entered the market with distinct architecture, Quality of Ser...
Due to the recent explosion of data volume and velocity, a new array of lightweight key-value stores have emerged to serve as alternatives to traditional databases. The majority of these storage engines, however, sacrifice their read performance in order to cope with write throughput by avoiding random disk access when writing a record in favor of...
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size...
Permissioned blockchain systems promise to provide both decentralized trust and privacy. Hyperledger Fabric is currently one of the most wide-spread permissioned blockchain systems and is heavily promoted both in industry and academia. Due to its optimistic concurrency model, the transaction failure rates in Fabric can become a bottleneck. While th...
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Monitoring the internal conditions of a machine is essential to increase its production efficiency and to reduce energy waste. Non-intrusive condition monitoring techniques, such as analysing electrical signals, provide insights by disaggregating a composite signal of a machine as a whole into the individual components to determine their states. De...
Due to recent explosion of data volume and velocity, a new array of lightweight key-value stores have emerged to serve as alternatives to traditional databases. The majority of these storage engines, however, sacrifice their read performance in order to cope with write throughput by avoiding random disk access when writing a record in favor of fast...
The accurate detection of appliance state transitions in electrical signals is fundamental for numerous energy-conserving applications. We present an extensive overview and categorization of the current state in event detection on high-sampling-rate signals. Existing approaches are designed for specific environments and need to be tediously adapted...
With energy consumption in high-performance computing clouds growing rapidly, energy saving has become an important topic. Virtualization provides opportunities to save energy by enabling one physical machine (PM) to host multiple virtual machines (VMs). Dynamic voltage and frequency scaling (DVFS) is another technology to reduce energy consumption...
In traditional IP-based publish/subscribe middlewares, a detour to overlay network is demanded to match events with defined filters, which introduces more latency overhead for delivering events from publishers to subscribers. The emerging Software Defined Networking (SDN) creates boundless possibilities to improve the efficiency of event delivery b...
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains, such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the...
Current Internet of Things (IoT) infrastructures, with its massive data requirements, rely on cloud storage: however, usage of a single cloud storage can place limitations on the IoT applications in terms of service requirements (performance, availability, security etc.). Multi-cloud storage architecture has emerged as a promising infrastructure to...
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single...
Distribution System Operators (DSOs) face several challenges in managing comprehensive and up-to-date models of distribution grids. To address these problems, we propose a crowdsourcing framework for collecting grid devices. We also provide an inference approach for generating topological models of the distribution grids. Since distribution cables...
Self-driving cars rely on several services to operate, some of which require financial interactions, such as paying for parking spaces or paying for battery charging in the case of the electric vehicles. Providing these services demands the cooperation of several parties and organizations that do not necessarily trust each other. Over the past few...
With in-memory key-value caches such as Redis and Memcached being a key component for many systems to improve throughput and reduce latency, cloud caches have been widely adopted for small companies to deploy their own cache systems. However, data security is still a major concern, which affects the adoption of cloud caches. Tenant's data stored in...
With the increased adaption of blockchain technologies, permissioned blockchains such as Hyperledger Fabric provide a robust ecosystem for developing production-grade decentralized applications. However, the additional latency between executing and committing transactions, due to Fabric's three-phase transaction lifecycle of Execute-Order-Validate...
Event logs of process-aware information systems play an increasingly critical role in today's enterprises because they are the basis for a number of business intelligence applications such as complex event processing, provenance analysis, performance analysis, and process mining. However, due to incorrect manual recording, system errors, and resour...
Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Data analysis together with machine learning models can be utilized to extract valuabl...
Appliance event detection is an elementary step in the NILM pipeline. Unfortunately, several types of appliances (e.g., switching mode power supply (SMPS) or multi-state) are known to challenge state-of-the-art event detection systems due to their noisy consumption profiles. By stepping away from distinct event definitions, we learn from a consumer...
With the advancement of cloud computing, many challenging scientific problems can be solved using scientific workflow technology which integrates geo-distributed instruments, applications, and big data effectively and efficiently. For workflow collaboration, the workflow protocols of all participants are needed. However, workflow protocols are not...
Efficient real-time analytics are an integral part of an increasing number of data management applications, such as computational targeted advertising, algorithmic trading, and Internet of Things. In this paper, we focus primarily on accelerating stream joins, which are arguably one of the most commonly used and resource-intensive operators in stre...
Serverless computing simplifies the life cycle of scalable web applications, through delegating most of the operational concerns to the cloud providers. One prominent serverless platform is Apache OpenWhisk which is employed by IBM Cloud. Despite the apparent benefits of serverless computing, some limitations of the serverless platform, such as the...
Directed Acyclic Graph (DAG) based Distributed Ledger Technologies (DLT) such as IOTA Tangle has been proposed to address the inefficiencies of traditional blockchains, including the issues with scalability, high resource consumptions, and the increasing transaction fees. Despite the promising features introduced by IOTA, the properties of DAG-base...
Despite the very high volatility of the cryptocurrency markets, the interest in the development and adaptation of existing cryptocurrencies such as Bitcoin as well as new distributed ledger technologies is increasing. Therefore, understanding the security and vulnerability issues of such blockchain systems plays a critical role. In this work, we pr...
Building reliable and scalable publish/subscribe (pub/sub) systems require tremendous development efforts. The serverless paradigm simplifies the development and deployment of highly available applications by delegating most of the operational concerns to the cloud providers. The serverless paradigm describes a programming model, where the develope...
The success and growing popularity of blockchain technology has lead to a significant increase in load on popular permissionless blockchains such as Ethereum. With the current design, these blockchain systems do not scale with additional nodes since every node executes every transaction. Further efforts are therefore necessary to develop scalable p...
Cryptocurrencies and Distributed Ledger Technologies, such as Ethereum have received extensive attention over the past few years. With the increasing popularity of Ethereum, comprehensive understanding of its various properties plays a critical role in the widespread adaptation. However, due to the significant requirements for deploying a full Ethe...
Known for powering cryptocurrencies such as Bitcoin and Ethereum, blockchain is seen as a disruptive technology capable of revolutionizing a wide variety of domains, ranging from finance to governance, by offering superior security, reliability, and transparency founded upon a decentralized and democratic computational model. In this tutorial, we f...
Data usage is a significant concern, particularly in smartphone applications, M2M communications and for Internet of Things (IoT) applications. Messages in these domains are often exchanged with a backend infrastructure using publish/subscribe (pub/sub). Shared dictionary compression has been shown to reduce data usage in pub/sub networks beyond th...
Real models of electrical transmission grids are difficult to obtain. The process of generating such models from unstructured and incomplete data is tedious, and the resulting models are rarely updated. This paper proposes a novel approach for automatically extracting power-relevant data from the public and unstructured crowdsourced OpenStreetMap (...
Especially in large-scale distributed systems, where a huge amount of resources and processes have to be coordinated, the system’s complexity reaches the limits of human capabilities. Parts of the application have to be scaled within seconds in order to handle an increasing number of requests. In addition to that, it is desirable that the entire ap...
Boolean expression matching is an important function for many applications. However, existing solutions still suffer from limitations when applied to high-dimensional and dense workloads. To overcome these limitations, in this paper, we design a data structure called PS-Tree that can efficiently index subscriptions in one dimension. By dividing pre...
In the era of Internet and big data, contemporary workflows become increasingly large in scale and complex in structure, introducing greater challenges for workflow modeling. Workflows are not with maximized concurrency and block-structuredness in terms of control flow, though languages supporting block-structuredness (e.g., BPEL) are employed. Exi...
Popularly known for powering cryptocurrencies such as Bitcoin and Ethereum, blockchains is seen as a disruptive technology capable of impacting a wide variety of domains, ranging from finance to governance, by offering superior security, reliability, and transparency in a decentralized manner. In this tutorial presentation, we first study the origi...
The recent success of electric vehicles leads to unprecedentedly high peaks of demand on the electric grid at the times when most people charge their cars. In order to avoid unreasonably rising costs due to inefficient utilization of the electricity infrastructure, we propose EVA: a scheduling system to solve the valley filling problem by distribut...
An essential security concern in the publish/subscribe paradigm is that of guaranteeing the confidentiality of the data being transmitted. Existing solutions require that some initial parameters, keys or secrets be exchanged or otherwise established between communicating entities before secure end-to-end communication can occur. Most existing solut...
With the ongoing integration of Renewable Energy Sources (RES), the complexity of power grids is increasing. Due to the fluctuating nature of RES, ensuring the reliability of power grids can be challenging. One possible approach for addressing these challenges is Demand Response (DR) which is described as matching the demand for electrical energy a...
Maintaining a complete and up-to-date model of the distribution grid is a challenging task, and the scarcity of open models represents a significant bottleneck for researchers in this area. In this work, we address these challenges by introducing a crowdsourcing framework for the collection of open data on distribution grid devices and an algorithm...
Historically, performance and price-performance of computer systems have been the key purchasing arguments for customers. However, with rising energy costs and increasing power consumption due to the ever-growing demand for compute power (servers, storage, networks), electricity bills have become a significant expense for today»s data centers. In o...
We investigate the use of content-based publish/subscribe for data dissemination in large-scale applications with expressive filtering requirements. In particular, we focus on top-k subscription filtering, where a publication is delivered only to the k best ranked subscribers, as ordered using expressive semantics such as relevance, fairness, and d...
Building scalable, highly available publish/subscribe (pub/sub) systems can require sophisticated algorithms and a tremendous amount of engineering effort. This paper demonstrates a way to build a pub/sub broker on top of the OpenWhisk serverless platform that performs topic-based and content-based matching. This approach radically simplifies the d...
Since the introduction of Bitcoin in 2008, blockchain systems have evolved immensely in terms of performance and usability. There is a massive focus on building enterprise blockchain solutions, with providers such as IBM and Microsoft already providing Blockchain-as-a Service (BaaS). To facilitate the adoption of blockchain technologies across vari...
Massively multiplayer online role-playing games (MMORPGs) allow thousands of players to interact with each other in a large-scale virtual environment. Interest management is an important technique used to raise the scalability of a game by limiting the amount of information transmitted to the players according to their relevance. In this paper, we...
Following the success of Bitcoin, Ethereum and Hyperledger, blockchains are now gaining widespread adoption in a wide variety of applications, using a diversity of distributed ledger systems with varying characteristics. Yet, beyond the original bitcoin protocol, the safety and reliability properties of such systems are not sufficiently analyzed. T...
The advent of Web 2.0 companies, such as Facebook, Google, and Amazon with their insatiable appetite for vast amounts of structured, semi-structured, and unstructured data, triggered the development of Hadoop and related tools, e.g., YARN, MapReduce, and Pig, as well as NoSQL databases. These tools form an open source software stack to support the...
This paper addresses the use of smart-home sensor streams for continuous prediction of energy loads of individual households which participate as an agent in local markets. We introduces a new device level energy consumption dataset recorded over three years wich includes high resolution energy measurements from electrical devices collected within...
Distributed content-based publish/subscribe systems provide a selective, scalable, and decentralized approach to data dissemination. In a pub/sub overlay network, hop-by-hop routing allows brokers to correctly forward messages without requiring global knowledge. However, this model causes brokers to forward publications without knowing the volume a...
A key feature of database systems is to provide transparent access to stored data. In distributed database systems, this includes data allocation and fragmentation. Transparent access introduces data dependencies and increases system complexity and inter-process communication. Therefore, many developers are exchanging transparency for better scalab...