Lluís Vilanova’s research while affiliated with Imperial College London and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (26)


Figure 3: Control points (Control points are triggered when application code calls functions from supported APIs, before the FAABRIC implementation of the given function is executed.)
Figure 11: Scaling the VM count (We report the makespan to execute a batch of jobs, which increases linearly with cluster size, and the distribution of execution times.)
Figure 12: Speed-up executing shared memory applications (When the number of threads spans multiple VMs, we compare FAABRIC's time with OpenMP's execution time with 8 threads.)
Faabric: Fine-Grained Distribution of Scientific Workloads in the Cloud
  • Preprint
  • File available

February 2023

·

59 Reads

·

·

Eleftheria Mappoura

·

[...]

·

Peter Pietzuch

With their high parallelism and resource needs, many scientific applications benefit from cloud deployments. Today, scientific applications are executed on dedicated pools of VMs, resulting in resource fragmentation: users pay for underutilised resources, and providers cannot reallocate unused resources between applications. While serverless cloud computing could address these issues, its programming model is incompatible with the use of shared memory and message passing in scientific applications: serverless functions do not share memory directly on the same VM or support message passing semantics when scheduling functions dynamically. We describe Faabric, a new serverless cloud runtime that transparently distributes applications with shared memory and message passing across VMs. Faabric achieves this by scheduling computation in a fine-grained (thread/process) fashion through a new execution abstraction called Granules. To support shared memory, Granules are isolated using WebAssembly but share memory directly; to support message passing, Granules offer asynchronous point-to-point communication. Faabric schedules Granules to meet an application's parallelism needs. It also synchronises changes to Granule's shared memory, and migrates Granules to improve locality.

Download



CAP-VMs: Capability-Based Isolation and Sharing for Microservices

February 2022

·

324 Reads

Cloud stacks must isolate microservices, while permitting efficient data sharing between isolated services deployed on the same physical host. Traditionally, the MMU enforces isolation and permits sharing at a page granularity. MMU approaches, however, lead to cloud stacks with large TCBs in kernel space, and the page granularity requires inefficient OS interfaces for data sharing. Forthcoming CPUs with hardware support for memory capabilities offer new opportunities to implement isolation and sharing at a finer granularity. We describe cVMs, a new VM-like abstraction that uses memory capabilities to isolate application components while supporting efficient data sharing, all without mandating application code to be capability-aware. cVMs share a single virtual address space safely, each having only capabilities to access its own memory. A cVM may include a library OS, minimizing its dependency on the cloud environment. cVMs efficiently exchange data through two capability-based primitives assisted by a small trusted monitor: (i) an asynchronous read/write interface to buffers shared between cVMs; and (ii) a call interface to transfer control between cVMs. Using these two primitives, we build more expressive mechanisms for efficient cross-cVM communication. Our prototype implementation using CHERI RISC-V capabilities shows that cVMs isolate microservices (Redis and Python) with low overhead while improving data sharing.




Using SMT to accelerate nested virtualization

June 2019

·

357 Reads

·

8 Citations

IaaS datacenters offer virtual machines (VMs) to their clients, who in turn sometimes deploy their own virtualized environments, thereby running a VM inside a VM. This is known as nested virtualization. VMs are intrinsically slower than bare-metal execution, as they often trap into their hypervisor to perform tasks like operating virtual I/O devices. Each VM trap requires loading and storing dozens of registers to switch between the VM and hypervisor contexts, thereby incurring costly runtime overheads. Nested virtualization further magnifies these overheads, as every VM trap in a traditional virtualized environment triggers at least twice as many traps. We propose to leverage the replicated thread execution resources in simultaneous multithreaded (SMT) cores to alleviate the overheads of VM traps in nested virtualization. Our proposed architecture introduces a simple mechanism to colocate different VMs and hypervisors on separate hardware threads of a core, and replaces the costly context switches of VM traps with simple thread stall and resume events. More concretely, as each thread in an SMT core has its own register set, trapping between VMs and hypervisors does not involve costly context switches, but simply requires the core to fetch instructions from a different hardware thread. Furthermore, our inter-thread communication mechanism allows a hypervisor to directly access and manipulate the registers of its subordinate VMs, given that they both share the same in-core physical register file. A model of our architecture shows up to 2.3× and 2.6× better I/O latency and bandwidth, respectively. We also show a software-only prototype of the system using existing SMT architectures, with up to 1.3× and 1.5× better I/O latency and bandwidth, respectively, and 1.2--2.2× speedups on various real-world applications.


DATS - Data Containers for Web Applications

March 2018

·

16 Reads

·

3 Citations

ACM SIGPLAN Notices

Data containers enable users to control access to their data while untrusted applications compute on it. However, they require replicating an application inside each container - compromising functionality, programmability, and performance. We propose DATS - a system to run web applications that retains application usability and efficiency through a mix of hardware capability enhanced containers and the introduction of two new primitives modeled after the popular model-view-controller (MVC) pattern. (1) DATS introduces a templating language to create views that compose data across data containers. (2) DATS uses authenticated storage and confinement to enable an untrusted storage service, such as memcached and deduplication, to operate on plain-text data across containers. These two primitives act as robust declassifiers that allow DATS to enforce non-interference across containers, taking large applications out of the trusted computing base (TCB). We showcase eight different web applications including Gitlab and a Slack-like chat, significantly improve the worst-case overheads due to application replication, and demonstrate usable performance for common-case usage.


DATS - Data Containers for Web Applications

March 2018

·

187 Reads

·

11 Citations

Data containers enable users to control access to their data while untrusted applications compute on it. However, they require replicating an application inside each container - compromising functionality, programmability, and performance. We propose DATS - a system to run web applications that retains application usability and efficiency through a mix of hardware capability enhanced containers and the introduction of two new primitives modeled after the popular model-view-controller (MVC) pattern. (1) DATS introduces a templating language to create views that compose data across data containers. (2) DATS uses authenticated storage and confinement to enable an untrusted storage service, such as memcached and deduplication, to operate on plain-text data across containers. These two primitives act as robust declassifiers that allow DATS to enforce non-interference across containers, taking large applications out of the trusted computing base (TCB). We showcase eight different web applications including Gitlab and a Slack-like chat, significantly improve the worst-case overheads due to application replication, and demonstrate usable performance for common-case usage.


Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC

April 2017

·

938 Reads

·

16 Citations

In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.


Citations (19)


... While this ratio is important, because it predicts the application performance of a partially swapped out application, it is not the only factor to consider when deciding on the granularity. For instance, opting for 2MB pages might result in less cold memory being identified due to hotness fragmentation [12]. Conversely, using 2MB pages accelerates page-table scans owing to smaller page table sizes. ...

Reference:

Flexible Swapping for the Cloud
Reconsidering OS memory optimizations in the presence of disaggregated memory
  • Citing Conference Paper
  • June 2022

... The trends in data center networks have enabled new designs for distributed databases. With the performance improvement of networks, there has been a shift toward disaggregation, separating compute and storage or even accelerator pools from traditional compute [7,12,26,38]. Nowadays, most cloud-native systems are disaggregated since they offer improved resource utilization, as each resource can be scaled independently based on demand. However, since all data is accessed over the network, considerable attention has been paid to developing fast network solutions. ...

Slashing the disaggregation tax in heterogeneous data centers with FractOS
  • Citing Conference Paper
  • March 2022

... Compartmentalization approaches can also enforce additional properties, typically to raise the bar against crosscompartment attacks. Among them, Cross-Compartment Control-Flow Integrity [161,211] (CC-CFI) enforces valid control-flow across compartments: cross-compartment call sites can only call compartment entry-points they would normally call according to the global Control-Flow Graph (CFG). Runtime re-compartmentalization [187,191] enables the policy to change at runtime to achieve more suitable security or performance trade-offs, e.g., as the load evolves. ...

CubicleOS: a library OS with software componentisation for practical isolation
  • Citing Conference Paper
  • April 2021

... By partitioning applications and running sensitive code in the TEE the security and data integrity are increased [26]. Sartakov et al. [61] describe the use of TEE in clouds to deploy applications that are protected from e.g. unauthorised access by cloud service providers. ...

Spons & Shields: practical isolation for trusted execution
  • Citing Conference Paper
  • April 2021

... However, currently, there is no mechanism deployed in the host programs that restricts them to obey the reference monitor. The DATS architecture for web applications [49] limits request processing to a restricted set of permissions. Given recent efforts in privilege separation [44], [50], restricting the permissions of host programs after authorization is future work. ...

DATS - Data Containers for Web Applications
  • Citing Conference Paper
  • March 2018

... Virtual prototyping technologies employ various kinds of IPC to establish communication in the virtual environment within those two IPC modes. Threads can now transcend process boundaries thanks to the OS feature known as ''direct IPC,'' [83] which repurposes and expands the CODOMs [84] design. The OS kernel is removed from the critical inter-process communication path, and processes are mapped into a shared address space. ...

Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC
  • Citing Conference Paper
  • April 2017

... In StarPU and OmpSs, the programmers must explicitly specify the available programming systems (e.g., CUDA, OpenCL) in the target system when their runtime libraries are built. In IRIS, on the other hand, the IRIS RTS automatically finds the available programming systems in the target system, and it dynamically loads their respective [25] PTask [26] StarPU [27] AMGE [28] Argobots [29] COSP [30] CUDA [5] Charm++ [31] Cilk [32] Dandelion [33] GeMTC [34] HIP [6] HPVM [35] HPX [36] Legion [37] MCL [38] Maestro [39] MultiGPU [23] OCR [40] OpenACC [9] OpenCL [41] OpenMP [42] PALMOS [43] PaRSEC [44] SnuCL [19] SYCL [11] STAPL-RTS [45] TBB [46] TVM [47] Uintah [48] VAST [49] VirtCL [50] shared libraries and indirectly calls their API functions. Also, StarPU and OmpSs do not support an automatic workload partitioning technique. ...

Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes
  • Citing Conference Paper
  • June 2015

... Besides embedded systems and single-core devices, scratchpads have also shown to be effective in more complex contexts. For example, is possible to find some proposals of compiler-assisted techniques to optimally exploit on-chip SPMs in many-core architectures with scientific and HPC workloads [18,19]. ...

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures
  • Citing Article
  • June 2015

ACM SIGARCH Computer Architecture News

... Virtual prototyping technologies employ various kinds of IPC to establish communication in the virtual environment within those two IPC modes. Threads can now transcend process boundaries thanks to the OS feature known as ''direct IPC,'' [83] which repurposes and expands the CODOMs [84] design. The OS kernel is removed from the critical inter-process communication path, and processes are mapped into a shared address space. ...

CODOMs: Protecting software with Code-centric memory Domains
  • Citing Article
  • October 2014

ACM SIGARCH Computer Architecture News