Matei Ripeanu

Matei Ripeanu
  • PhD
  • Professor (Full) at University of British Columbia

About

187
Publications
51,802
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,795
Citations
Introduction
Current institution
University of British Columbia
Current position
  • Professor (Full)

Publications

Publications (187)
Conference Paper
Full-text available
Article
Full-text available
Ensuring the success of big graph processing for the next decade and beyond.
Article
Pattern matching is a fundamental tool for answering complex graph queries. Unfortunately, existing solutions have limited capabilities: They do not scale to process large graphs and/or support only a restricted set of search templates or usage scenarios. Moreover, the algorithms at the core of the existing techniques are not suitable for today’s g...
Preprint
Full-text available
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the...
Preprint
Full-text available
Pattern matching is a fundamental tool for answering complex graph queries. Unfortunately, existing solutions have limited capabilities: they do not scale to process large graphs and/or support only a restricted set of search templates or usage scenarios. We present an algorithmic pipeline that bases pattern matching on constraint checking. The key...
Chapter
In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of the attack and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop a...
Conference Paper
Detectable but Uncorrectable Errors (DUEs) in the memory subsystem are becoming increasingly frequent. Today, upon encountering a DUE, applications crash, and the recovery methods used incur significant performance, storage, and energy overheads. To mitigate the impact of these errors, we start from two high-level observations that apply to some cl...
Article
In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of new attacks and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop...
Conference Paper
In the face of large-scale automated cyber-attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of new attacks and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning to enable large-scale onl...
Conference Paper
Full-text available
This study characterizes the NVIDIA Jetson TK1 and TX1 Platforms, both built on a NVIDIA Tegra System on Chip and combining a quad-core ARM CPU and an NVIDIA GPU. Their heterogeneous nature, as well as their wide operating frequency range, make it hard for application developers to reason about performance and determine which optimizations are wort...
Conference Paper
Full-text available
Requirements for reliability, low power consumption, and performance place complex and conflicting demands on the design of high-performance computing (HPC) systems. Fault-tolerance techniques such as checkpoint/restart (C/R) protect HPC applications against hardware faults. These techniques, however, have non negligible overheads particularly when...
Article
Interferometric Synthetic Aperture Radar (InSAR) is a remote sensing technology used for estimating the displacement of an object on the ground or the earth's surface itself. Persistent Scatterer-InSAR (PS-InSAR) is a category of time series algorithms enabling high resolution monitoring. PS-InSAR relies on successful selection of points that appea...
Article
This paper proposes using file system custom metadata as a bidirectional communication channel between applications and the storage middleware. This channel can be used to pass hints that enable cross-layer optimizations, an option hindered today by the ossified file-system interface. We study this approach in the context of storage system support...
Article
The wide adoption of graphics processing units (GPUs) as accelerators for general-purpose applications makes the end-to-end reliability implications of their use increasingly significant. Fault injection is a widely adopted method to evaluate the resilience of applications. However, building a fault injector for general-purpose GPU applications is...
Conference Paper
Full-text available
The orthodox paradigm to defend against automated social-engineering attacks in large-scale socio-technical systems is reactive and victim-agnostic. Defenses generally focus on identifying the attacks/attackers (e.g., phishing emails, social-bot infiltrations, malware offered for download). To change the status quo, we propose to identify, even if...
Article
Detecting fake accounts in online social networks (OSNs) protects both OSN operators and their users from various malicious activities. Most detection mechanisms attempt to classify user accounts as real (i.e., benign, honest) or fake (i.e., malicious, Sybil) by analyzing either user-level activities or graph-level structures. These mechanisms, how...
Conference Paper
Traditional defense mechanisms for fighting against automated fake accounts in online social networks are victim-agnostic. Even though victims of fake accounts play an important role in the viability of subsequent attacks, there is no work on utilizing this insight to improve the status quo. In this position paper, we take the first step and propos...
Article
Full-text available
Tagging is a popular feature that supports several collaborative tasks, including search, as tags produced by one user can help others finding relevant content. However, task performance depends on the existence of 'good' tags. A first step towards creating incentives for users to produce 'good' tags is the quantification of their value in the firs...
Article
Full-text available
The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations...
Conference Paper
Full-text available
Large scale-free graphs are famously difficult to process efficiently: the highly skewed vertex degree distribution makes it difficult to obtain balanced workload partitions for parallel processing. Our research instead aims to take advantage of vertex degree heterogeneity by partitioning the workload to match the strength of the individual computi...
Article
Multimedia content is central to our experience on the Web. Specifically, users frequently search and watch videos online. The textual features that accompany such content (e.g., title, description, and tags) can generally be optimized to attract more search traffic and ultimately to increase the advertisement-generated revenue. This study investig...
Conference Paper
Full-text available
As workflow-based data-intensive applications have become in-creasingly popular, the lack of support tools to aid resource provisioning decisions, to estimate the energy cost of running such applications, or simply to support configuration choices has become increasingly evident. Our goal is to design techniques to predict the energy consumption of...
Conference Paper
Developing a distributed system is a complex and error-prone task. Properly handling the interaction of a potentially large number of distributed components while keeping resource usage low and performance high is challenging. The state-of-the-practice on performance evaluation focuses on employing profilers to detect and fix potential performance...
Article
Full-text available
Graphs are widespread data structures used to model a wide variety of problems. The sheer amount of data to be processed has prompted the creation of a myriad of systems that help us cope with massive scale graphs. The pressure to deliver fast responses to queries on the graph is higher than ever before, as it is demanded by many applications (e.g....
Article
A large portion of the audience of video content items on the web currently comes from keyword-based search and/or tag-based navigation. Thus, the textual features of this content (e.g., the title, description, and tags) can directly impact the view count of a particular content item, and ultimately the advertisement generated revenue. More importa...
Conference Paper
Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, effective data handling is critical to efficient application execution. This paper investigates the capabilities of a...
Conference Paper
Full-text available
System provisioning, resource allocation, and system configuration decisions for I/O-intensive workflow applications are complex even for expert users. Users face choices at multiple levels: allocating resources to individual sub-systems (e.g., the application layer, the storage layer) and configuring each of these optimally (e.g., replication leve...
Conference Paper
Deduplication is a commonly-used technique on disk-based storage pools. However, deduplication has not been used for tape-based pools: tape characteristics, such as high mount and seek times combined with data fragmentation re- sulting from deduplication create a toxic combination that leads to unacceptably high retrieval times. This work proposes...
Conference Paper
As a consequence of increasing hardware fault rates, HPC systems face significant challenges in terms of reliability. Evaluating the error resilience of HPC applications is an essential step for building efficient fault-tolerant mechanisms for these applications. In this paper, we propose a methodology to characterize the resilience of OpenMP progr...
Article
Full-text available
Online gaming is a multi-billion dollar industry that entertains a large, global population. One unfortunate phenomenon, however, poisons the competition and spoils the fun: cheating. The costs of cheating span from industry-supported expenditures to detect and limit it, to victims’ monetary losses due to cyber crime. This article studies cheaters...
Article
Full-text available
The Silences of the Archives, the Reknown of the Story. The Martin Guerre affair has been told many times since Jean de Coras and Guillaume Lesueur published their stories in 1561. It is in many ways a perfect intrigue with uncanny resemblance, persuasive deception and a surprizing end when the two Martin stood face to face, memory to memory, befor...
Conference Paper
While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is...
Article
Video content abounds on the Web. Although viewers may reach items via referrals, a large portion of the audience comes from keywordbased search. Consequently, the textual features of multimedia content (e.g., title, description, tags) will directly impact the view count of a particular item, and ultimately the advertisement-generated revenue. This...
Article
Full-text available
The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, bu...
Conference Paper
Full-text available
Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites...
Conference Paper
This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to processing units, such that the algorithmic tasks exercise each of the units where they perform be...
Conference Paper
This paper proposes COntribution-based Incentive Design (COIN) as a general guideline for developing applications in the context of mobile crowd-sensing. As a case study, we apply the key ideas of crowd-sensing to design a smart parking system. The system encourages contribution by differentiating service and assigning tasks proactively to maintain...
Conference Paper
Sybil attacks in social and information systems have serious security implications. Out of many defence schemes, Graph-based Sybil Detection (GSD) had the greatest attention by both academia and industry. Even though many GSD algorithms exist, there is no analytical framework to reason about their design, especially as they make different assumptio...
Conference Paper
Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequen...
Article
Full-text available
Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end pe...
Article
Online Social Networks (OSNs) have attracted millions of active users and have become an integral part of today’s web ecosystem. Unfortunately, in the wrong hands, OSNs can be used to harvest private user data, distribute malware, control botnets, perform surveillance, spread misinformation, and even influence algorithmic trading. Usually, an adver...
Article
Full-text available
User-generated content is shaping the dynamics of the World Wide Web. Indeed, an increasingly large number of systems provide mechanisms to support the growing demand for content creation, sharing, and management. Tagging systems are a particular class of these systems where users share and collaboratively annotate content such as photos and URLs....
Article
Full-text available
This paper proposes using file system custom metadata as a bidirectional communication channel between applications and the storage system. This channel can be used to pass hints that enable cross-layer optimizations, an option hindered today by the ossified file-system interface. We study this approach in context of storage system support for larg...
Conference Paper
An increasing number of mobile applications aim to enable "smart cities" by harnessing contributions from citizens armed with mobile devices that have sensing ability. However, there are few generally recognized guidelines for developing and deploying crowdsourcing-based solutions in mobile environments. This paper considers the design of a crowdso...
Conference Paper
We present a preliminary evaluation of error-resilience of GPGPU applications. We find that, compared to CPUs, these platforms lead to a higher rate of silent data corruption a major concern since these errors are not flagged at runtime and often remain latent. We also find that out-of-bound memory accesses are the most critical reason of crashes....
Conference Paper
GPUs have been originally designed for error-resilient workload. Today, GPUs are used in error-sensitive applications, e.g. General Purpose GPU (GPGPU) applications. The goal of this project is to investigate the error resilience of GPGPU applications and understand their reliability characteristics. To this end, we employ fault injection on real G...
Conference Paper
Full-text available
Crowdsourcing has inspired a variety of novel mobile applications. However, identifying common practices across different applications is still challenging. In this paper, we use smart parking as a case study to investigate features of crowdsourcing that may apply to other mobile applications. Based on this we derive principles for efficiently harn...
Article
Full-text available
This paper presents an optimization mechanism to increase the performance of cloud services that transfer groups of deduplicated virtual machine (VM) images. This is necessary as the naive data transfer approach for groups of deduplicated VM images is extremely inefficient as it generates highly random disk access pattern. The optimization mechanis...
Conference Paper
Full-text available
Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to- memory access ratio. Additionally, most real-world graphs have a low diameter and a highly hetero...
Conference Paper
Full-text available
This paper evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing storage systems unable to harness a...
Conference Paper
Full-text available
The ease with which we adopt online personas and relationships has created a soft spot that cyber criminals are willing to exploit. Advances in artificial intelligence make it feasible to design bots that sense, think and act cooperatively in social settings just like human beings. In the wrong hands, these bots can be used to infiltrate online com...
Conference Paper
Full-text available
Online gaming is a multi-billion dollar industry that en-tertains a large, global population. One unfortunate phenomenon, however, poisons the competition and the fun: cheating. The costs of cheating span from industry-supported expenditures to detect and limit cheating, to victims' monetary losses due to cyber crime. This paper studies cheaters in...
Article
Full-text available
Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redes...
Preprint
Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redes...
Conference Paper
Cloud-based backup and archival services use large tape libraries as a cost-effective cold tier in their online storage hierarchy today. These services leverage deduplication to reduce the disk storage capacity required by their customer data sets, but they usually re-duplicate the data when moving it from disk to tape.
Article
Full-text available
Online gaming is a multi-billion dollar industry that entertains a large, global population. One unfortunate phenomenon, however, poisons the competition and the fun: cheating. The costs of cheating span from industry-supported expenditures to detect and limit cheating, to victims' monetary losses due to cyber crime. This paper studies cheaters in...
Conference Paper
Full-text available
Online Social Networks (OSNs) have become an integral part of today's Web. Politicians, celebrities, revolutionists, and others use OSNs as a podium to deliver their message to millions of active web users. Unfortunately, in the wrong hands, OSNs can be used to run astroturf campaigns to spread misinformation and propaganda. Such campaigns usually...
Conference Paper
Full-text available
Many-Task Computing (MTC) is a new application category that encompasses increasingly popular applications in biology, economics, and statistics. The high inter-task parallelism and data-intensive processing capabilities of these applications pose new challenges to existing supercomputer hardware-software stacks. These challenges include resource p...
Article
This paper discusses the use of many-task computing tools for multiscale modeling. It defines multiscale modeling and places different examples of it on a coupling spectrum, discusses the Swift parallel scripting language, describes three multiscale modeling applications that could use Swift, and then talks about how the Swift model is being extend...
Conference Paper
Full-text available
The energy costs of running computer systems are a growing concern: for large data centers, recent estimates put these costs higher than the cost of hardware itself. As a consequence, energy efficiency has become a pervasive theme for designing, deploying, and operating computer systems. This paper evaluates the energy trade-offs brought by data de...
Article
Full-text available
Web caches, content distribution networks, peer-to-peer file-sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of community's data acces...
Article
Full-text available
This paper explores the feasibility of a storage architecture that offers the reliability and access performance characteristics of a high-end system, yet is cost-efficient. We propose ThriftStore, a storage architecture that integrates two types of components: volatile, aggregated storage and dedicated, yet low-bandwidth durable storage. On the on...
Conference Paper
Full-text available
System logs are an important tool in studying the conditions (e.g., environment misconfigurations, resource status, erroneous user input) that cause failures. However, production system logs are complex, verbose, and lack structural stability over time. These traits make them hard to use, and make solutions that rely on them susceptible to high mai...
Conference Paper
Full-text available
This paper presents VMFlockMS, a migration service optimized for cross-datacenter transfer and instantiation of groups of virtual machine (VM) images that comprise an application-level solution (e.g., a three-tier web application). We dub these groups of related VM images VMFlocks. VMFlockMS employs two main techniques: first, data deduplication wi...
Article
Full-text available
The retrieval and analysis of malicious content is an essential task for security researchers. At the same time, the distrib-utors of malicious files deploy countermeasures to evade the scrutiny of security researchers. This paper investigates two techniques used by malware download centers: frequently updating the malicious payload, and blacklisti...
Article
Full-text available
As distributed applications increase in size and complexity, traditional authorization architectures based on a dedicated authorization server become increasingly fragile because this decision point represents a single point of failure and a performance bottleneck. Authorization caching, which enables the reuse of previous authorization decisions,...
Article
Data-structures that map well are required when porting applications to hybrid architectures such as graphics processing units (GPU) based platforms. Hybrid platforms that use GPU have the ability to deliver higher peak computational rate and memory bandwidth. A GPU is used for the sequence alignment problems that aims to find all occurrences of ea...
Conference Paper
Versatile storage systems aim to maximize storage resource utilization by supporting the ability to `morph' the storage system to best match the application's demands. To this end, versatile storage systems significantly extend the deployment- or run-time configurability of the storage system. This flexibility, however, introduces a new problem: a...
Conference Paper
Full-text available
GPUs offer drastically different performance characteristics compared to traditional multicore architectures. To explore the tradeoffs exposed by this difference, we refactor MUMmer, a widely-used, highly-engineered bioinformatics application which has both CPU- and GPU-based implementations. We synthesize our experience as three high-level guideli...
Conference Paper
Full-text available
Assessing the value of individual users' contributions in peer-production systems is paramount to the design of mechanisms that support collaboration and improve users' experience. For instance, to incentivize contributions, file-sharing systems based on the BitTorrent protocol equate value with volume of contributed content and use a prioritizatio...

Network

Cited By