Hans-arno Jacobsen

Hans-arno Jacobsen
University of Toronto | U of T

About

250
Publications
32,061
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,760
Citations

Publications

Publications (250)
Article
Full-text available
The growing number of data centers consumes a vast amount of energy for processing. There is a desire to reduce the environmental footprint of the IT industry, and one way to achieve this is to use renewable energy sources. A challenge with using renewable resources is that the energy output is irregular as a consequence of the intermittent nature...
Preprint
Byzantine fault-tolerant (BFT) consensus algorithms are at the core of providing safety and liveness guarantees for distributed systems that must operate in the presence of arbitrary failures. Recently, numerous new BFT algorithms have been proposed, not least due to the traction blockchain technologies have garnered in search for consensus solutio...
Preprint
Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low m...
Preprint
Growing excitement around permissionless blockchains is uncovering its latent scalability concerns. Permissioned blockchains offer high transactional throughput and low latencies while compromising decentralization. In the quest for a decentralized, scalable blockchain fabric, i.e., to offer the scalability of permissioned blockchain in a permissio...
Preprint
Leader-based consensus protocols must undergo a view-change phase to elect a new leader when the current leader fails. The new leader is often decided upon a candidate server that collects votes from a quorum of servers. However, voting-based election mechanisms intrinsically cause competition in leadership candidacy when each candidate collects on...
Preprint
Full-text available
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yi...
Article
Full-text available
Transfer Learning is a well-studied concept in machine learning, that relaxes the assumption that training and testing data need to be drawn from the same distribution. Recent success in applying transfer learning in the area of computer vision has motivated research on transfer learning also in context of time series data. This benefits learning i...
Article
In smart grids, the large-scale integration of distributed renewable energy resources has enabled the provisioning of alternative sources of supply. Peer-to-peer (P2P) energy trading among local households is becoming an emerging technique that benefits both energy prosumers and operators. Since conventional energy supply is still needed to help fi...
Article
We present a new approach for designing reliable and scalable overlay networks to support topic-based pub/sub communication. We propose the MinAvg-kTCO problem parameterized by k: use the minimum number of edges to create a k-topic-connected overlay (kTCO) for pub/sub systems, i.e., for each topic, the sub-overlay induced by nodes interested in the...
Article
Current Internet of Things (IoT) infrastructures rely on cloud storage however, relying on a single cloud provider puts limitations on the IoT applications and Service Level Agreement (SLA) requirements. Recently, multiple decentralized storage solutions (e.g., based on blockchains) have entered the market with distinct architecture, Quality of Ser...
Conference Paper
Full-text available
Due to the recent explosion of data volume and velocity, a new array of lightweight key-value stores have emerged to serve as alternatives to traditional databases. The majority of these storage engines, however, sacrifice their read performance in order to cope with write throughput by avoiding random disk access when writing a record in favor of...
Preprint
Full-text available
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size...
Preprint
Full-text available
Permissioned blockchain systems promise to provide both decentralized trust and privacy. Hyperledger Fabric is currently one of the most wide-spread permissioned blockchain systems and is heavily promoted both in industry and academia. Due to its optimistic concurrency model, the transaction failure rates in Fabric can become a bottleneck. While th...
Preprint
Full-text available
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Article
Full-text available
Monitoring the internal conditions of a machine is essential to increase its production efficiency and to reduce energy waste. Non-intrusive condition monitoring techniques, such as analysing electrical signals, provide insights by disaggregating a composite signal of a machine as a whole into the individual components to determine their states. De...
Article
Full-text available
Due to recent explosion of data volume and velocity, a new array of lightweight key-value stores have emerged to serve as alternatives to traditional databases. The majority of these storage engines, however, sacrifice their read performance in order to cope with write throughput by avoiding random disk access when writing a record in favor of fast...
Article
The accurate detection of appliance state transitions in electrical signals is fundamental for numerous energy-conserving applications. We present an extensive overview and categorization of the current state in event detection on high-sampling-rate signals. Existing approaches are designed for specific environments and need to be tediously adapted...
Article
With energy consumption in high-performance computing clouds growing rapidly, energy saving has become an important topic. Virtualization provides opportunities to save energy by enabling one physical machine (PM) to host multiple virtual machines (VMs). Dynamic voltage and frequency scaling (DVFS) is another technology to reduce energy consumption...
Article
Full-text available
In traditional IP-based publish/subscribe middlewares, a detour to overlay network is demanded to match events with defined filters, which introduces more latency overhead for delivering events from publishers to subscribers. The emerging Software Defined Networking (SDN) creates boundless possibilities to improve the efficiency of event delivery b...
Article
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains, such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the...
Conference Paper
Current Internet of Things (IoT) infrastructures, with its massive data requirements, rely on cloud storage: however, usage of a single cloud storage can place limitations on the IoT applications in terms of service requirements (performance, availability, security etc.). Multi-cloud storage architecture has emerged as a promising infrastructure to...
Preprint
Full-text available
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single...
Article
Distribution System Operators (DSOs) face several challenges in managing comprehensive and up-to-date models of distribution grids. To address these problems, we propose a crowdsourcing framework for collecting grid devices. We also provide an inference approach for generating topological models of the distribution grids. Since distribution cables...
Conference Paper
Self-driving cars rely on several services to operate, some of which require financial interactions, such as paying for parking spaces or paying for battery charging in the case of the electric vehicles. Providing these services demands the cooperation of several parties and organizations that do not necessarily trust each other. Over the past few...
Conference Paper
With in-memory key-value caches such as Redis and Memcached being a key component for many systems to improve throughput and reduce latency, cloud caches have been widely adopted for small companies to deploy their own cache systems. However, data security is still a major concern, which affects the adoption of cloud caches. Tenant's data stored in...
Conference Paper
Full-text available
With the increased adaption of blockchain technologies, permissioned blockchains such as Hyperledger Fabric provide a robust ecosystem for developing production-grade decentralized applications. However, the additional latency between executing and committing transactions, due to Fabric's three-phase transaction lifecycle of Execute-Order-Validate...
Article
Event logs of process-aware information systems play an increasingly critical role in today's enterprises because they are the basis for a number of business intelligence applications such as complex event processing, provenance analysis, performance analysis, and process mining. However, due to incorrect manual recording, system errors, and resour...
Conference Paper
Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Data analysis together with machine learning models can be utilized to extract valuabl...
Conference Paper
Appliance event detection is an elementary step in the NILM pipeline. Unfortunately, several types of appliances (e.g., switching mode power supply (SMPS) or multi-state) are known to challenge state-of-the-art event detection systems due to their noisy consumption profiles. By stepping away from distinct event definitions, we learn from a consumer...
Article
With the advancement of cloud computing, many challenging scientific problems can be solved using scientific workflow technology which integrates geo-distributed instruments, applications, and big data effectively and efficiently. For workflow collaboration, the workflow protocols of all participants are needed. However, workflow protocols are not...
Article
Full-text available
Efficient real-time analytics are an integral part of an increasing number of data management applications, such as computational targeted advertising, algorithmic trading, and Internet of Things. In this paper, we focus primarily on accelerating stream joins, which are arguably one of the most commonly used and resource-intensive operators in stre...
Conference Paper
Serverless computing simplifies the life cycle of scalable web applications, through delegating most of the operational concerns to the cloud providers. One prominent serverless platform is Apache OpenWhisk which is employed by IBM Cloud. Despite the apparent benefits of serverless computing, some limitations of the serverless platform, such as the...
Conference Paper
Directed Acyclic Graph (DAG) based Distributed Ledger Technologies (DLT) such as IOTA Tangle has been proposed to address the inefficiencies of traditional blockchains, including the issues with scalability, high resource consumptions, and the increasing transaction fees. Despite the promising features introduced by IOTA, the properties of DAG-base...
Conference Paper
Despite the very high volatility of the cryptocurrency markets, the interest in the development and adaptation of existing cryptocurrencies such as Bitcoin as well as new distributed ledger technologies is increasing. Therefore, understanding the security and vulnerability issues of such blockchain systems plays a critical role. In this work, we pr...
Conference Paper
Building reliable and scalable publish/subscribe (pub/sub) systems require tremendous development efforts. The serverless paradigm simplifies the development and deployment of highly available applications by delegating most of the operational concerns to the cloud providers. The serverless paradigm describes a programming model, where the develope...
Conference Paper
Full-text available
The success and growing popularity of blockchain technology has lead to a significant increase in load on popular permissionless blockchains such as Ethereum. With the current design, these blockchain systems do not scale with additional nodes since every node executes every transaction. Further efforts are therefore necessary to develop scalable p...
Conference Paper
Cryptocurrencies and Distributed Ledger Technologies, such as Ethereum have received extensive attention over the past few years. With the increasing popularity of Ethereum, comprehensive understanding of its various properties plays a critical role in the widespread adaptation. However, due to the significant requirements for deploying a full Ethe...
Conference Paper
Full-text available
Known for powering cryptocurrencies such as Bitcoin and Ethereum, blockchain is seen as a disruptive technology capable of revolutionizing a wide variety of domains, ranging from finance to governance, by offering superior security, reliability, and transparency founded upon a decentralized and democratic computational model. In this tutorial, we f...
Conference Paper
Data usage is a significant concern, particularly in smartphone applications, M2M communications and for Internet of Things (IoT) applications. Messages in these domains are often exchanged with a backend infrastructure using publish/subscribe (pub/sub). Shared dictionary compression has been shown to reduce data usage in pub/sub networks beyond th...
Article
Real models of electrical transmission grids are difficult to obtain. The process of generating such models from unstructured and incomplete data is tedious, and the resulting models are rarely updated. This paper proposes a novel approach for automatically extracting power-relevant data from the public and unstructured crowdsourced OpenStreetMap (...
Thesis
Especially in large-scale distributed systems, where a huge amount of resources and processes have to be coordinated, the system’s complexity reaches the limits of human capabilities. Parts of the application have to be scaled within seconds in order to handle an increasing number of requests. In addition to that, it is desirable that the entire ap...
Article
Full-text available
Boolean expression matching is an important function for many applications. However, existing solutions still suffer from limitations when applied to high-dimensional and dense workloads. To overcome these limitations, in this paper, we design a data structure called PS-Tree that can efficiently index subscriptions in one dimension. By dividing pre...
Article
In the era of Internet and big data, contemporary workflows become increasingly large in scale and complex in structure, introducing greater challenges for workflow modeling. Workflows are not with maximized concurrency and block-structuredness in terms of control flow, though languages supporting block-structuredness (e.g., BPEL) are employed. Exi...
Conference Paper
Full-text available
Popularly known for powering cryptocurrencies such as Bitcoin and Ethereum, blockchains is seen as a disruptive technology capable of impacting a wide variety of domains, ranging from finance to governance, by offering superior security, reliability, and transparency in a decentralized manner. In this tutorial presentation, we first study the origi...
Conference Paper
Full-text available
The recent success of electric vehicles leads to unprecedentedly high peaks of demand on the electric grid at the times when most people charge their cars. In order to avoid unreasonably rising costs due to inefficient utilization of the electricity infrastructure, we propose EVA: a scheduling system to solve the valley filling problem by distribut...
Conference Paper
An essential security concern in the publish/subscribe paradigm is that of guaranteeing the confidentiality of the data being transmitted. Existing solutions require that some initial parameters, keys or secrets be exchanged or otherwise established between communicating entities before secure end-to-end communication can occur. Most existing solut...
Conference Paper
With the ongoing integration of Renewable Energy Sources (RES), the complexity of power grids is increasing. Due to the fluctuating nature of RES, ensuring the reliability of power grids can be challenging. One possible approach for addressing these challenges is Demand Response (DR) which is described as matching the demand for electrical energy a...
Conference Paper
Maintaining a complete and up-to-date model of the distribution grid is a challenging task, and the scarcity of open models represents a significant bottleneck for researchers in this area. In this work, we address these challenges by introducing a crowdsourcing framework for the collection of open data on distribution grid devices and an algorithm...
Conference Paper
Historically, performance and price-performance of computer systems have been the key purchasing arguments for customers. However, with rising energy costs and increasing power consumption due to the ever-growing demand for compute power (servers, storage, networks), electricity bills have become a significant expense for today»s data centers. In o...
Conference Paper
Full-text available
We investigate the use of content-based publish/subscribe for data dissemination in large-scale applications with expressive filtering requirements. In particular, we focus on top-k subscription filtering, where a publication is delivered only to the k best ranked subscribers, as ordered using expressive semantics such as relevance, fairness, and d...
Conference Paper
Full-text available
Building scalable, highly available publish/subscribe (pub/sub) systems can require sophisticated algorithms and a tremendous amount of engineering effort. This paper demonstrates a way to build a pub/sub broker on top of the OpenWhisk serverless platform that performs topic-based and content-based matching. This approach radically simplifies the d...
Conference Paper
Full-text available
Since the introduction of Bitcoin in 2008, blockchain systems have evolved immensely in terms of performance and usability. There is a massive focus on building enterprise blockchain solutions, with providers such as IBM and Microsoft already providing Blockchain-as-a Service (BaaS). To facilitate the adoption of blockchain technologies across vari...
Conference Paper
Full-text available
Massively multiplayer online role-playing games (MMORPGs) allow thousands of players to interact with each other in a large-scale virtual environment. Interest management is an important technique used to raise the scalability of a game by limiting the amount of information transmitted to the players according to their relevance. In this paper, we...
Conference Paper
Full-text available
Following the success of Bitcoin, Ethereum and Hyperledger, blockchains are now gaining widespread adoption in a wide variety of applications, using a diversity of distributed ledger systems with varying characteristics. Yet, beyond the original bitcoin protocol, the safety and reliability properties of such systems are not sufficiently analyzed. T...
Conference Paper
The advent of Web 2.0 companies, such as Facebook, Google, and Amazon with their insatiable appetite for vast amounts of structured, semi-structured, and unstructured data, triggered the development of Hadoop and related tools, e.g., YARN, MapReduce, and Pig, as well as NoSQL databases. These tools form an open source software stack to support the...
Article
This paper addresses the use of smart-home sensor streams for continuous prediction of energy loads of individual households which participate as an agent in local markets. We introduces a new device level energy consumption dataset recorded over three years wich includes high resolution energy measurements from electrical devices collected within...
Conference Paper
Distributed content-based publish/subscribe systems provide a selective, scalable, and decentralized approach to data dissemination. In a pub/sub overlay network, hop-by-hop routing allows brokers to correctly forward messages without requiring global knowledge. However, this model causes brokers to forward publications without knowing the volume a...
Conference Paper
A key feature of database systems is to provide transparent access to stored data. In distributed database systems, this includes data allocation and fragmentation. Transparent access introduces data dependencies and increases system complexity and inter-process communication. Therefore, many developers are exchanging transparency for better scalab...